4.3 Ways to Get Data

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
1 hour 59 minutes
Difficulty
Beginner
CEU/CPE
2
Video Transcription
00:00
>> Module 4, talking about
00:00
the different ways to get data into Splunk.
00:00
In this video, we'll discuss
00:00
some examples of the many data sources
00:00
Splunk can work with.
00:00
We'll talk about ways to get data,
00:00
goes through creating an index,
00:00
add some data by uploading a file,
00:00
talk about source types,
00:00
and create some field extractions.
00:00
Then we'll finish off with the quiz.
00:00
Here's a nice image of examples of
00:00
what Splunk can index from splunk.com.
00:00
This highlights different types of data such as
00:00
Windows event logs, Linux command results,
00:00
events from Cloud services,
00:00
weblogs, database queries,
00:00
NetFlow data, clickstream data,
00:00
power consumption information, and more.
00:00
Down at the bottom here are some common sources of data.
00:00
In the middle we see popular forms of data.
00:00
For example, you might retrieve metrics on webblogs or
00:00
ingest data from tickets
00:00
opened by intrusion detection systems.
00:00
All these types of information can be
00:00
transformed into events that are
00:00
searchable and usable in different ways.
00:00
There are many ways to get these types of data.
00:00
You can, for example,
00:00
monitor files and directories,
00:00
upload data, run scripts and collect the results,
00:00
listening on our network ports,
00:00
including listening for sys log messages,
00:00
collect events using WMI,
00:00
run queries against connected databases,
00:00
perform API calls, and other methods.
00:00
We'll be uploading a file for
00:00
a simple demonstration in this video.
00:00
As a review, a Splunk index is a data repository.
00:00
When raw data is turned into events,
00:00
it gets put into an index.
00:00
These indexes are helpful for running efficient searches.
00:00
When you're able to search across a specific index,
00:00
that can speed up your searches.
00:00
For example, you might have an index called Cisco
00:00
ASA that just contains Cisco ASA logs.
00:00
When you're looking for ASA logs,
00:00
you wouldn't want to search across
00:00
all your Windows Event logs
00:00
while trying to find something.
00:00
Indexes can also help
00:00
you apply more control to your data.
00:00
For example, if you know you need
00:00
to keep authentication logs for
00:00
six months but only need to
00:00
keep application logs for one month,
00:00
you can apply those different kinds of
00:00
retention policies by index.
00:00
Additionally, you can easily limit
00:00
users to certain types of data but only
00:00
allowing them to search across
00:00
specific indexes that apply to their jobs.
00:00
Source types are used to identify the structure of
00:00
events and Splunk uses
00:00
these to format the data while indexing.
00:00
You might have multiple source types in the same index.
00:00
For example, you might collect
00:00
all your web sphere logs and an index called WebSphere,
00:00
but WebSphere activity logs are
00:00
formatted differently and are marked with
00:00
a different source type than WebSphere system error logs.
00:00
You can also use source types to
00:00
narrow down your searches in Splunk.
00:00
Field extraction is pulling out fields from event data.
00:00
Splunk automatically recognizes fields for
00:00
some source types and you can
00:00
also manually extract fields on your data.
00:00
In this example, we have
00:00
a NetScreen firewall event and
00:00
then we have lots of
00:00
potential fields including an action field.
00:00
In this case, the field name could be
00:00
action and the field value would be deny.
00:00
In another event, the field name would still be action,
00:00
but the field value could be allow.
00:00
The field name does not have to
00:00
be specified in the event.
00:00
This Jun at the beginning could be
00:00
extracted and have a field name of
00:00
month and a field value of Jun or June.
00:00
With that, we're going to jump into a basic example.
00:00
I have my Splunk server up and on it I have
00:00
a file field with some example exchanging mail logs.
00:00
Normally we probably wouldn't want to
00:00
upload a file to get these types of events,
00:00
but we'll do it for this example.
00:00
I'm going to my Splunk web interface.
00:00
I'm going to click on "Settings" and "Add Data".
00:00
I also have the option on my main page here.
00:00
From here, I'm going to scroll down and click "Upload".
00:00
Then I'm going to select the file we were just
00:00
looking at. Click "Next".
00:00
Splunk automatically did a good job of breaking
00:00
these events out and identifying the timestamps.
00:00
If I click on source type here,
00:00
I can try some of the pre-trained
00:00
source types Splunk has.
00:00
Under email, for example,
00:00
if I click "Proximal" as a source type,
00:00
it no longer breaks that out into
00:00
events and leaves them in a clump.
00:00
That's obviously not the right source type for this.
00:00
I'm going to go back to the default
00:00
here and make my own source type.
00:00
There are some other options
00:00
under here that you can play with.
00:00
But I'm just going to click "Save As".
00:00
I'm going to call this exchange logs
00:00
and put it in the email category.
00:00
Now, if I want to upload
00:00
the same type of file in the future,
00:00
I can pick the source type.
00:00
I'm going to click "Next" on this.
00:00
From here, I have to decide what the host is.
00:00
Since I uploaded the file,
00:00
leaving as my hostname makes sense,
00:00
so I'm just going to keep that as is.
00:00
It's also currently set to go to the default index.
00:00
It's often a good idea to put things into
00:00
the default index until you make sure things are working.
00:00
But I'm going to create a new index for
00:00
this data just by clicking here.
00:00
For index name, I'm going to call it exchange.
00:00
Leave as events and leave all the rest of
00:00
the settings as default for now.
00:00
I'm going to save this,
00:00
click "Review" up here, and "Submit".
00:00
From here, I can run a search across my new data.
00:00
It's already picked out
00:00
my index here that I assigned it,
00:00
and we've got all these events.
00:00
If you notice, here's the timeline of these events.
00:00
I can look at one of these and it's
00:00
broken out some of the fields such as
00:00
the source type and a long the side here it's
00:00
got things like hour and day and minute broken out.
00:00
But I'd like to have more fields.
00:00
I'm going to click here and extract new fields.
00:00
Here I'll just select
00:00
a sample event to work with. Click "Next".
00:00
Now I can decide between using a regular expression
00:00
or delimiters to break out my field.
00:00
Delimiters would be a good option if my fields are
00:00
separated by something like a tab or a special character.
00:00
In this case, I'm going to
00:00
pick regular expression and hit "Next".
00:00
You have the option of writing your own rejects
00:00
here or by trying to make Splunk to do the work.
00:00
To try to get an extraction,
00:00
I'm going to highlight a field.
00:00
I'm going to look at this first IP and click on it.
00:00
I'm going to name this field IP and hit "Add Extraction".
00:00
Now down at the bottom we can go and see
00:00
how this works with other events.
00:00
If we see here on different events,
00:00
it's pulled out this IP field even with different values.
00:00
I can click here to see if there's
00:00
anything that doesn't match,
00:00
which there's nothing, I think
00:00
we're in pretty good shape for this.
00:00
Now, if I want another field,
00:00
I can do the same thing.
00:00
I'm going to look at this two email address.
00:00
I'm going to call this recipient and add the extraction.
00:00
Now if I look at the sample events,
00:00
I'm going to remove that.
00:00
I got a notification that failed.
00:00
I think I missed that first letter there.
00:00
Let's see if this does better.
00:00
Recipient. That worked.
00:00
But if I scroll down here, there's a problem.
00:00
I call it recipient because it said
00:00
to here and it made sense for the sample event I had.
00:00
But now I'm also getting it for from.
00:00
I probably want to work more on this field extraction.
00:00
If I wanted to, I could click "Show
00:00
regular expression" and write my own.
00:00
But I'm just going to remove it to keep this video
00:00
short and we'll keep this IP field.
00:00
Click "Next". I'll review it and this all looks good.
00:00
It's pulling out the IPs on
00:00
all these different events. Click "Next".
00:00
Everything looks good here, and "Finish".
00:00
Now I can explore
00:00
the field that I just created and search.
00:00
Now if we were to rerun that search,
00:00
there should be a new field value
00:00
there that we could use.
00:00
We had index equals
00:00
exchange.
00:00
I need all times since there were
00:00
events that were older than the last 24 hours.
00:00
Now, if I look on the left side here,
00:00
there's a new field, that's our IP field.
00:00
It gives us a count, so we've got
00:00
12 counts of this 205 IP.
00:00
We could also do other things like specifically search
00:00
for a result and see the events related to this.
00:00
If I just wanted to see events for
00:00
that, we can pull that out.
00:00
Or I could do things like run statistics by
00:00
IP and lot of
00:00
other types of searches that
00:00
we'll get into in the next module.
00:00
Now that we've successfully added
00:00
data to our Splunk index, is quiz time.
00:00
True or false, you should keep all
00:00
of your data in the same index.
00:00
The answer is false.
00:00
Breaking out your data
00:00
into different indexes can help you run
00:00
searches more efficiently and apply
00:00
different rules to different types of data.
00:00
In our next video,
00:00
we'll add even more data to Splunk by modifying
00:00
a config file on the machine where we
00:00
installed the universal forwarder.
Up Next