Data Pipelines

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *

Already have an account? Sign In »

6 hours 3 minutes
Video Transcription
Hello and welcome back to the Splunk Enterprise Certified Administrator course on Cyber, where Entering Module 10. The final module for this course world will be talking about tuning inputs.
So, yeah, as you can see on the course that outline we finally made it where l in the last module. So we're gonna get through these next couple of lessons and finish out the course and
get you ready to take your exam. So let's talk about lesson 10.1, where we'll be talking through the data pipelines and Splunk.
So the learning objectives here, or talk about the four sub components of the data pipeline then talked about the steps involved in each sub component. Talk about how we can manipulate some configurations and how they interact with this process to enhance indexing efficiency.
And then we'll discuss a swell how heavy forgers affect the data pipeline. Basically, how it's distributed between devices, which we've into that in previous videos.
So why are we learning this? So now that we're bringing data in, there's basically some spotting configurations, some props, configurations that you'll always make to or should always make Teoh accompany your inputs, and they directly basically
basically therefore performance and the way they help. You'll be ableto understand after we talk about the data pipeline and how that works and then we'll talk about the actual configurations after that to show you. Look, this helps with this area where this makes that more efficient.
So we're learning this is like a baseline, so that when we learned the next step, it makes more sense. But also,
we're gonna learn this so that from a troubleshooting perspective, if you know the entire steps and which order they happen in, then that could be really helpful for narrowing down where issues air happening with your inputs or with your props configurations when you're troubleshooting.
So here is a diagram that is going to give us a basically visual overview off what the data pipelines in Splunk looks like. So basically whatever data you have coming in
on the left here, the TCP tail exact off FIFA, whatever, whatever data source you have will send the data in, and it enters this first Q, which is called the Parsing pipeline.
And so during this phase, basically is when encoding is performed, it's when the chunks of events that the forwarder send
to the indexers with heavy borders are broken into individual events. And basically the header that is also associate with that data is read. Teoh basically associate the proper,
um, metadata to each event.
Then once all that's done, you move into the merging pipeline, which now that Splunk broke. So I guess the important to mention here the default behavior for this line breaker actually is not to break it line by our event. By event, it's to break it line by line,
and then in this phase, it merges those lines back together into whole events
before moving to the typing. Play blind, where if you have any props, transforms that are going to do, like said commands or re keying data or whatever that will happen here.
And also annotate er will run, which will basically create the punked field in Splunk, which is just a list of all of the
what spoon cause. Major minor breakers basically a list of punctuation keys, Um, and that gives you like essentially a format signature of each event, and it could be super helpful if you're looking for unique
event structures for parsing or if you are bringing in like data through an event, hoping you figure out what your different unique data as
formats are. You can use the punk field or, if you are, like looking through error logs. If you de dupe on the punk field, you only get one of each unique log type, so it could just make it much easier to
look through your data and just find one of each unique event. Or you can find the rarest
punctuation event that you are generated in Splunk. It's just very useful, but this is when that field is created.
And then finally, the data enters the indexing pipeline, where it's either sent to an external system or it is placed on an index.
So as we mentioned, there are a couple places throughout this process where you can get some serious efficiency gains. And so I highlighted them in red here. So one option is, and this is what we're gonna discuss in the next video should line merge. If you set that toe false
and you set your line breaker in prop sykov
to an actual unique value that represents the end of each event, Then this line breaker phase *** will actually break on Lee
on new events. And so then it doesn't need this merging pipeline at all where it reassembles the events because it's already breaking them into events instead of single lines and then stitching them back together into events. So that's one place where you can gain some serious efficiency.
Another is, if you're not going to use this point field at all, you could set, annotate, punked, toe false, and then it skips that step entirely.
I would not advise doing that, though, because the punk field is incredibly valuable.
Let's do a quick knowledge assessment. So which file which CONFIG file has the annotate punk and should line murder? At line Merge attributes? So if you were paying attention on the last lie, that should be a super easy one. So give me a second to pick an answer here, and we'll go over it in the next slide.
So the answer is props dot com. Flat is where you configure both of those attributes, as I mentioned,
um, in a previous slide or two slides ago. It is one of the Big Eight props configurations that we should be making on each of our inputs that will also discuss more in the next video.
So now on to the final concept for the lesson how the heavy four order affects the data pipelines.
So I've mentioned before that if if you have a heavy Ford er any data that goes the heavy foreigner parsing will be done on the heavy Ford er And so this is what I mean by that these three phases the parsing pipeline, the merging pipeline in the typing pipeline, these processes will occur on the heavy four order,
and then the data will be forwarded to the indexer
and then all the index, or does it save it to disk? And so that's what the heavy forger actually does. It just breaks these phases out. So it does
these three pipelines and index air just as the last pipeline.
So that's important to know.
So, in summary, we discussed the four phases of the data pipeline.
We have parsing, merging, annotating or typing. I'm sorry, which includes annotating on. Then way also have indexing. Then we talk about the processing that occurs in each phase. We talked about how you could eliminate the merging pipeline altogether. by using a proper event.
Ah, line breaker and should line merge equals false.
And then we also discussed how heavy forgers impact the data pipeline by splitting up these pipelines and basically having the 1st 3 occur on the heavy for order. And then the indexing
pipeline occurs on the indexers. So that wraps up everything you need to know about data pipelines. So we'll talk about the props configurations and transforms in the next slide, which will link pretty heavily back to this information. So look forward to seeing you in the next video and getting into that information.
Up Next