# Data Distribution

Video Activity
Join over 3 million cybersecurity professionals advancing their career
or

Time
9 hours 53 minutes
Difficulty
Intermediate
CEU/CPE
10
Video Transcription
00:00
Hi, guys. Welcome back. I'm Katherine McKeever, and this is your lean six sigma green belt. So today we're going to go over that a distribution. And we talked about it in kind of loose terms, working up to this point. But in this module, we're really going to just talk about the conceptual background
00:17
for data distribution. From that, I want you to understand what the point is of studying distribution. Why are we taking the time
00:24
to work on this? And then I want to introduce to you our Greenbelt data set. So this is the data you're gonna be familiar with through the rest of the course.
00:33
So the first thing offices, we talk a little bit about distributions. It becomes a really big deal later on. In the course, we talk about the different distribution patterns and what a typical looks like and statistical process control. But really, let's ask ourselves, why do we care? Distributions are the way that we tell that story. So when we talk about
00:53
data driven decision making, well, we're talking about is
00:56
how do we construct our data such that we're able to create actionable insights. So data that tells us something one of the three hallmarks of the culture of Kai's Enter that culture of continuous improvement, data driven decision making. And we have talked about throughout the course. When we're talking about graphical analysis
01:15
01:18
process large amounts of information e more easily when it's in visual graphical form. So distributions are a graphical form off the probability of a specific value as the result of the process.
01:32
So what this is is this is aggregated information. We run the process. We take all of the data that we've collected with our sampling plan and are
01:40
our data collection plans from our measure phase. And then we start aggregating it together, and that's that's going to be our descriptive statistics. So our measures of central tendency and our measures of dispersion from that we're able to convert the information into inferential statistics
01:59
using our distribution. So if I run a process 100 times,
02:04
what is the likelihood that I'm going to get the result? 87? That's what distributions tell us, and that's how it links to inferential statistics. But really, the most important take away is that the distributions tell us the story of the process, and we'll talk about that in a couple of slides.
02:23
So the next thing about why you want to learn about distributions and this is kind of stepping back and putting ourselves into a researchers perspective. So we we know that mathematically distributions exist week. We can show these different values,
02:38
and that's all cool and fine and dandy. I mean, I can create models that will tell you anything in the system,
02:43
but at the end of the day, the question is, Is this Riel? So is this a model that we have created in the system and we can see this pattern in real life. So we're gonna fast forward Teoh, Sir Galton
02:54
who created the Queen Cox or also known as The Bean Machine. So you may have seen a Galton board on Amazon. You flip it over. It's got a bunch of little pegs and some little beads in it, and every time you flip it over,
03:07
you should get a pattern that looks roughly like your normal distribution. What that shows us is that if you take an equal number of, um, process runs so our beans in this case and put them through equal number steps or pegs. In this case,
03:27
you will still see some variation. Remember, some variation is natural in your process, but at the end of the day you will see what we call the normal distribution.
03:37
So we talked about it quite a bit because it's the way that data behaves in nature. Same marbles, same pegs. You can flip it over and over again, and you'll still see this normal distribution, which invalidates our understanding from a mathematical perspective.
03:53
When we plot all of these data points on a distribution or a curve, we can say, OK, this is what it looks like.
04:00
Now let's mirror it in reality. And this is really important because when you start thinking about how your data tells the story, you and I most both know that you can say anything you want from a statistics perspective.
04:14
So being able to understand what it looks like in reality was very important to those initial statisticians who were proposing this.
04:21
So Queen Cox. It is the basis of probability and distributions as we understand it really interesting cocktail conversation for you, but it would, it shows us is. It mirrors reality and mathematics, so it's a validation of our senses.
04:40
So when we talk about distributions and I said it tells the story, the reason why we study distributions is because distributions show not tell. So if you remember from high school English when you were writing your interpretive fiction, your and your teacher probably told you to show not tell when you're talking about
04:59
situations or scenarios or scenes,
05:01
that's exactly what a distribution does for you here. So what we're looking at in this and graph is actually three different distributions presented two different ways. So on the top row you are looking at your history Graham distributions.
05:18
So we've talked about it a lot. I've told you it's gonna be very important
05:23
on the bottom row. What you are looking at is called a box plot or a box and whisker plot. These air, the exact same data sets. It's just one is easier to read than the other one, so both of them show us both datasets show us the same information. So on the far right, we have a skewed left,
05:42
which tells us that we have a longer tail on the left. When we get to a typical distributions
05:46
that would make sense or the majority of our values are above our median or are, um,
05:55
our halfway point in our data? So when we start thinking about Howard, descriptive statistics apply in the middle are pink. Of course it's pink is our normal distribution. This is what we expect to look at, And the reason why I wanted to show the two different ways to show your distributions is when we talk about data telling a story
06:13
and being easily easy to consume.
06:15
Unless you're familiar reading a box plot, the lower graphs are are going to require more time to digest, as you was the practitioner. It's going to require you to explain what you're observing. Just so you know, the middle areas are where you have the central tendency.
06:33
The whiskers or the external areas
06:36
are going to be here, measures of dispersion. And then, in this particular case, your two little pink dots, top or bottom are going to be outliers. That being said, looking at your ah, your normal distribution right down the middle or your peaked symmetric you can see very easily
06:55
most of your data centralizes around 16 with one standard deviation being for so 12 and 20 being your one standard deviation from the mean
07:03
07:05
it's important to take away from this that you, as the green belt, are going to have to make decisions on how you convey your information. Distributions are the most common way to convey, and my recommendation to you would be to continue using that method because it's what people are used to consuming.
07:26
07:29
they're going to be familiar with the idea of a history Graham box plots or some of the other, more fancy ways of displaying information we'll talk about in black Belt require a little bit more interpretation, and every time you add a layer of interpretation, you potentially lose some of the meaning in your data.
07:48
So now we're gonna fast forward to what we're gonna talk about for the rest of this course. Eso we talked about how your data sets can be same. Datasets can be sliced and diced a whole bunch of different ways for your green boat course. This is the data that I will do all of the analyses with.
08:07
So we're looking at our American presidents
08:09
their height in inches the year they were born and the year they were elected. So when we do our correlation analysis in our history, grams, this is the data set were going to work from It is going to be available with your your greenbelt course.
08:26
So if you want to go in and play with it and see how I got there is an Excel file.
08:30
It will be available is of course, material.
08:33
So with that, today we went over the our introduction to distributions, and we talked about how it was that we want. Why it is that we studied distributions in a physical sense. So we have that validation of reality. So it gives you it lends credibility to the messages that you tell
08:50