Data Lifecycle

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
6 hours 3 minutes
Difficulty
Intermediate
Video Transcription
00:00
Hello and welcome back to the Splunk Enterprise Certified Administrator course on Cyber. This is less than five point to where we'll be discussing the data lifecycle for data that's ingested in the Splunk. The learning objectives for this lesson are gonna meet to discuss the three types of buckets
00:16
Essentially, what they're key characteristics are and what causes
00:20
one bucket to transition to being a different type of bucket. We'll talk about the four stages of the data life cycle and then also how to Thal frozen data.
00:31
Why are we learning this? So as a Splunk administrator, you're going to be responsible for the data that get brought gets brought into the environment. And most likely, there will be pretty strict retention requirements. If you have an internal on a team, you'll likely hear from them and have to
00:49
prove that you're keeping data as long as you're supposed to.
00:52
And so understanding this process and understanding how to recover and search on data that's been frozen is gonna be very important in your role as a Splunk administrator.
01:03
So as we discussed at the end of the last video, there are three types of buckets and Splunk There's hot, warm and cold buckets.
01:11
Ah, hot bucket is going to be the most recent data that's being ingested into Splunk. They're the only buckets that are rideable. And so any incoming data will always be written to a hot bucket.
01:23
Key characteristics are that they irritable and searchable, and that you'll store these buckets on your fastest storage that's available on your indexers.
01:33
Some things that would cause a hot bucket Teoh role, which is basically springs terminology for transition between bucket types. So it would go for being a hot to a warm bucket. And what would cause that transition is either Splunk restarting.
01:49
Ah, too much time passing since the creation of the bucket,
01:53
the size of the bucket getting too big or there being too many hot buckets. There is a limit to how many hot buckets you can have, so if you reach that, it'll automatically roll an older hot bucket to warm to make room for a new hot bucket.
02:09
Then there's warm buckets, which is Ah, completed bucket. And what I mean by that is it has all the data that's going to be written to it. So it's full. It's still very recent events, and so you should put it on pretty fast storage. Still, as it's much more likely to be searched on regularly,
02:27
key characteristics would be that it's searchable but no longer rideable, and it should still be on your fastest available storage on your indexers. And what would cause a warm bucket to roll to co cold would be basically, if you run out of space in your index if your data
02:46
is
02:47
ages out or if you have too many warm buckets and you need to make room for Morville than your data will roll as well.
02:57
Then there is the third type of bucket, which is a cold bucket. These are for your older data, and it's less like less likely to see is much search traffic, so you can generally put these on to a slower storage medium if you want to, to save some cost.
03:15
The reasons that these would roll toe frozen
03:19
would be that you ran out space for the index or the data has passed the maximum age for pretension.
03:29
So this is going to show you basically, uh, how data goes
03:34
through these different buckets and what that looks like this is the four steps, and we'll just kind of go through each one of the steps. And first will look how new data comes into Splunk, and it's written directly to a hot bucket in the index. And then, as we mentioned
03:52
on the previous slide, there's a number
03:54
of criteria that could be hit. But one of those events is gonna happen, and it's gonna trigger that hot bucket to roll to a warm bucket. So now that bucket is complete, it's been written to as much as it's going to be an hour. It'll sit in this warm bucket space, which
04:11
all that really means is, you know, the the bucket has officially been named because it knows the earliest and latest event that it contains, and it's still gonna be on the same storage because hot and warm are gonna be stored in the same thing. But then, after a certain amount of time or too much data ingest occurs,
04:30
your data will roll
04:32
too cold. And when this happens, if you're cold, storage is on a different storage medium than the data will be moved from the faster storage to the cold storage directory and you'll now be able to search on that cold data.
04:53
And then finally, once the cold data either ages out or hits that size limitations for the index,
05:00
that data will roll to frozen. And so what that means since we haven't talked about this yet, is well by default. The data will be deleted if you don't set a frozen directory, then. But once the data is done in cold buckets, once that
05:17
retention period or retention size that you've set
05:20
for your index has been reached, that data will just be deleted.
05:25
But if you specify a directory than your data will be saved two frozen. So as I mentioned by default, there is no frozen archive that is just pleaded. The frozen path can be set, and then once you have data that's frozen, it isn't searchable.
05:41
But there is the capability to thaw it, which makes it searchable again in Splunk and also does not count as re ingesting data. This won't hurt your
05:51
license at all.
05:54
So if you have
05:57
frozen data directory and you thought it and technically there could be 1/5 step of this data life cycle where you revive that frozen data and perform some searches on it if you like to.
06:09
And this is the process for falling frozen data. So you simply copy the frozen bucket into the thought directory,
06:17
and then you run the Splunk command line
06:20
command, so Splunk rebuild and then you'll have to restart the indexer. But at that point, you will then be able to actually search on that data.
06:31
So that's a useful trick in case you have some data that you don't necessarily need to search often. But you want to store because maybe you have, like, a seven year retention, and you don't want toe use up your strong storage for that long. She put it in this frozen storage on some
06:47
some cheap, some real cheap mass storage or something. But then you can selectively bring it back if needed. Just approve. Yes, we still have this data, so it's Ah, it's a good feature to know that you have that available to
07:01
now for a quick knowledge assessment. So what happens to hot buckets when Splunk restarts? You have second here to pause if you need to and select an answer, and we'll answer this on the next slide.
07:14
So what happens is when Splunk restarts hot buckets, role toe warm automatically and then new hot buckets will be created. Aspelin starts up.
07:24
So in summary, we discussed the three types of buckets hot, warm and cold. We've talked about the four steps of the data lifecycle what events would trigger you to keep moving through that life cycle? And we also talked about what happens with data when it rolls to frozen by default
07:42
or how to configure
07:44
a frozen directory. And then, if you have a frozen directory, we discussed how you can go about falling that frozen data so that you can run some searches against it.
07:56
That wraps up this lesson. We'll see you in the next video.
Up Next