Sampling Techniques

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *

Already have an account? Sign In »

9 hours 53 minutes
Video Transcription
Hi, guys. Welcome back. I'm Katherine MacGyver, and this is your lean six sigma green belt.
Today we're going to go over some data sampling techniques. So at this point in your course, you have a measurement system already decided You did that as part of your measure phase,
and now we're actually going to be collecting riel data. So with that, we're going to go over some of the most common data sampling techniques.
Simple random, which is what you will most likely be most familiar with segmented and stratified sampling.
So simple. Random is what is most common. And I say that because whenever you think about sampling, you think of this where you're like, OK, out of, ah, 100 data points. I want to measure 10 of them. I'm just going to go ahead and grab 10 at whim.
So with that, a couple of things to keep in mind. Remember that you want to think about your sample size, so we're gonna use 110 through this module cause it's easier to understand.
But hopefully when you're working on your project, you have vast amounts of data to pull from, preferably thousands of data elements because the larger your sample size, the more representative it is to your population or more. How much more it looks like a mini me
simple. Random sampling. Ah, 100 your population. We want 10. We just grab them. Ah, couple of things to keep in mind is this is actually a very bad example because it says sample every three ah, couple of things that you'll notice in sampling every three.
That's a pattern. Humans buying nature. We like to see patterns and things. It's, uh,
cognitive bias called pattern necessity, where we want to create that. So with that being said, there is a possibility that if you create a structure like every three or every five, you are in fact introducing bias into your sampling technique.
To combat this, my recommendation is to use a random number generator. So 100 numbers you tell your random number generator that you want 10 numbers out of it, it will randomly create them. There's a machine learning behind it
that will have no aspect of Pattern.
Excel has this. They're very widely available online. I have found that it is the easiest way to minimize my own implicit bias. And what I say by that is I tend to like odd numbers. So
ones threes, fives, sevens. If you've noticed throughout the course the examples that I give
all tend to be odd. That was showing up in my random sampling. I want 2127 33 that sort of thing. So we're starting to see a pattern in that So simple random sampling, Very straightforward. You decide what your sample sizes. You pull that many data elements
from your data. Set
a trick of the trade. Use a random number generator when possible. If not, do try to minimize the patterning when you pull it. So every three every five every 10 you're not starting to potentially introduce some bias into your data. Because, remember, the goal of sampling is to be representative
to your population.
Next set of sampling is a little bit more complicated. So we're talking about segmentation sampling
and what segmentation sampling is is we take our data set and we divided into logical chunks from those logical chunks. We get an equal sample size from each of those.
So, for example, we're gonna work with our same 100 data points, but we're going to make it 99 for ease of math. So let's say that those same 100 data points could be divided into three sections off. 33 irregardless of the proportion within the population were going to say
day shift, swing shift, night shift. We want
chunks from each of these or male female or 18 to 25 26 to 32. Or however, makes sense to slice your data. But so for our example, we have three buckets. We have day swing and night shift each have 33 data points in it.
From that, we're going to randomly pull
five. So we have taken our data. We have sliced it into segments. For the sake of this example, we're going to say that those are all equal segments and we pull the same size sample from each chunk. They do not have to be equal segments. So let's say that
60% of our data elements were from that first bucket day shift. We're still gonna pull five. So we have that representative of an equal sample population
from each segment.
This is very common in market research, So if there are marketers out there you're saying then and no, it's not quite that easy. It can be when we're looking at it from a continuous improvement standpoint. I mean, we're not asking for data trees to break in our segment. We're simply saying what is a logical divide and this will become relevant as we start looking at our distributions,
data tells a story,
but segmentation sampling segments, logical chunks of your data, equal size samples from each of those segments to represent your population. So we're looking at equal numbers from day swing in night shift in this example.
stratified sampling so stratified a sampling is still going to be segmented, and it is sometime cold, stratified segmentation sampling. But what it is is now we're looking for proportionate weights.
So this example shows equal weights of all three groups or segments or stratas.
But what we're looking for here is a representative population. So, for example, 60% of our data was collected on day shift. Conversely, 60% of our sample should come from that day. Shift count.
Remember, The key in sampling is always, of course,
representative population. So if you were wherever wonder woman was from and you pulled a random sampling. But you found that of your sample size of 10. You had eight men. You know that? That is not going to be representative.
But if we think about where Wonder Woman's from and we're gonna say that 90% of them are
female, when we do our random sampling, we're going to expect nine of our 10 data points to be female. So what this gives us is a greater sense of what the data actually looks like. From a population standpoint, this can be challenging
if you don't know the breakdown of your population.
So stratified Sampling tends to be used in more sophisticated data collection plans, because if you're not aware of what the slice in the mix of your data is, you're not going to be able to stratify it so that you have a proportionate population. But the difference between segment
sampling and stratified sampling is even though you have those different areas, segments or stratas,
you are looking at either equal sample size, as we do in segments or proportionate sample size and stratified sampling.
So today we went over our most common data sampling techniques. You remember. The most important thing from simple random is don't introduce your own pattern segmentation these air going to be logical delineations that have equal sample size stratified, still logical delineations. But now proportionate
sample size.
With that, our model, we're actually going to go over measurement scales, so I will see you guys there.
Up Next