Raw Log Anatomy: Understanding My SIEM System

February 26, 2017 | Views: 9683

Begin Learning Cyber Security for FREE Now!

FREE REGISTRATIONAlready a Member Login Here

Raw Log Anatomy: My SIEM system reads my raw logs, why do I need to understand them?

**NOTE: Examples used in this posting are very old, but the principles remain sound. I had a choice between using very old logs that I could leave whole, or make significant changes to newer logs to obfuscate identifiable information.

Some Initial Notes About Raw Logs

  • There is often a MASSIVE difference in format between different systems of the same function (Pix vs NetGear, both firewalls, is an example later in the post). It is important to understand the fields and information contained within those raw logs.

  • People, even people in security, don’t always understand what is in the original logs. It is common to underestimate how MUCH information there is. Relying completely on the correlated output from SIEM systems does not reveal all the applicable information, or allow understanding of the entire situation.

  • Understanding the log source equipment and function helps explain the difference between logs – Firewall vs server; proxy vs Web Application Firewall, etc. It helps to have some experience digging into logs for troubleshooting and other purposes. It is also important to have an understanding of the environment in which the logs were created.

  • A note about syslog, LEEF (Log Extended Event Format), and CEF (Common Event Format): These are reasonably standardized log formats, with fields in the same order regardless of what device is sending the logs. Devices can send logs which are formatted according to the defined standard, and this allows for easy log parsing and analysis. There are additional standardized log formats in use, consult Google for more information on these standardized log formats.

“Raw Logs”, why do I care?

Question: Raw logs are usually ugly and boring, so why should we, as analysts or security experts, care what they look like? A SIEM (Security Information Event Management) system’s job is to interpret those, right?

Answer: In order to function properly, a SIEM system needs someone, an engineer, software programmer, or often US to tell it how to interpret those logs first. Also, knowing how to read a log is a valuable skill, and often comes in handy when troubleshooting!

Anatomy of a Log Entry

Let’s start with the anatomy of a log entry. Logs consist of different fields, each containing a specific information type.

Fields in log entries can be very different between devices. Logs sometimes have different fields, the data may be formatted differently between different logs types, and typically the fields are in different locations.

See the examples for IIS and for Checkpoint Firewall at the end for two log types with dramatically different fields.

This lack of consistency can make it difficult to interpret logs from different devices.

At first glance, a Cisco ASA log looks nothing like a NetGear Firewall log, even though they’re the same kind of device:

 

NetGear Firewall Log Entry Example:

Sun, 2004-03-28 13:52:46 – TCP packet – Source:81.248.19.27,60001 ,WAN – Destination:217.224.147.21,4467 ,LAN [Drop] – [TCP preconnect traffic dropped]

Cisco PIX firewall log entry example:

Mar 29 2004 09:54:30: %PIX-6-106015: Deny TCP (no connection) from 192.168.0.2/2794 to 192.168.216.1/2357 flags SYN ACK on interface inside

 

Once broken down, the anatomy of the two log sources is actually nearly identical:

<<Timestamp>> <<message type>> <<source>> <<destination>> <<message>>

NetGear Example:

<<Sun, 2004-03-28 13:52:46>> <<TCP packet>> <<Source:81.248.19.27,60001 ,WAN>> <<Destination:217.224.147.21,4467 ,LAN>> <<LAN [Drop] – [TCP preconnect traffic dropped]>>

PIX Example:

<<Mar 29 2004 09:54:30>> <<Deny TCP (no connection)>> <<from 192.168.0.2/2794>> <<to 192.168.216.1/2357>> <<flags SYN ACK on interface inside>>

For clarity, the fields in these logs are, in order:

  • Timestamp

  • Message type

  • Source IP

  • Destination IP

  • Message

Log Field Types and Data Field Separation

Next, let’s examine the common types of log fields and their meaning.

  • Timestamp (when did this happen?)

  • Message Type (This can be confusing as it varies considerably depending on the log source device type. This is where practice, reading logs, comes in handy! “Deny” and “Allow” are common firewall message types.)

  • IP Address (sometimes both Source and Destination; where did this happen?)

  • Message (what actually happened?)

  • Not always present: Device name or identifier (“Cisco”, hostname, etc. Who is reporting what happened?)

Log Field Separators

Log fields have to have some indicator of the different areas. Common separators are tabs, commas, colons, dashes, and semicolons.

 

Log Timestamps

Logs often use different timestamp formats. They also vary in time zone and 12 vs 24 hour format.

Some examples of different formats:

Mar 29 2004 09:54:32

Sun, 2004-03-28 13:52:46

07/Mar/2004:16:06:51

 

Upon occasion you will find timestamps without dates:

03:01:06 127.0.0.1 GET /images/sponsered.gif 304

(This is an IIS log)

A special note about Linux “epoch” timestamps!

Epoch timestamps are just strings of numbers: 1431041867

These can be run through epoch timestamp converters to get a human-readable output. Depending on your version of *nix environment, you may be able to do this at the command line. BSD, for example, will commonly accept

# date -d @<epoch_timestamp>

Or some variant. As a matter of interest, the epoch timestamp is based on the number of seconds which have passed since the “beginning of Unix/Linux”, although explanations of the exact date choice vary.

Operating System Logs vs Network or Security Device Logs

Operating system logs are markedly different from network or security device logs. They are primarily concerned with what is going on with that specific machine, not with the network or environment.

There are some exceptions – Windows Filtering Platform (firewall) for example.

Important Note Regarding Microsoft OS (Windows) Logs

MS Server Security Logs* 2003 vs 2008

With the advent of Server 2008 Microsoft, the Security log format ad error code has changed. Specifically, what used to be a 3* digit Event ID has now become a 4 digit event ID for 2008 and beyond.

Lists for both Event ID’s are available through Microsoft’s TechNet.

*There are 2 digit Event ID codes for System Events

Additional Log Examples

Here are some additional log samples:

Example – HP UX

Mar 12 11:44:20 server7 ftpd[25306]: command: QUIT^M

Mar 12 11:44:20 server7 ftpd[25306]: <— 221

Mar 12 11:44:20 server7 ftpd[25306]: Goodbye.

Mar 12 11:44:35 server7 tftpd[24955]: Timeout (no requests in 10 minutes)

Mar 12 12:17:03 server7 sshd[26501]: pam_authenticate: error Authentication failed

Mar 12 12:17:03 server7 sshd[26501]: Accepted publickey for user from 111.222.333.444 port 32774 ssh2

 

The field layout of the HP UX log entry:

<<timestamp>> <<source name/hostname>> <<message source (daemon and process ID)>> <<message>>

Here are some additional examples, without the field layout breakdown. Try doing the breakdown yourself:

 

Apache log samples

192.168.72.177 – – [22/Dec/2002:23:32:15 -0400] “GET /style.css HTTP/1.1” 200 4138 www.yahoo.com “http://www.yahoo.com/index.html” “Mozilla/5.0 (Windows…” “-“

192.168.72.177 – – [22/Dec/2002:23:32:16 -0400] “GET /js/ads.js HTTP/1.1” 200 10229 www.yahoo.com “http://www.search.com/index.html” “Mozilla/5.0 (Windows…” “-“

192.168.72.177 – – [22/Dec/2002:23:32:19 -0400] “GET /search.php HTTP/1.1” 400 1997 www.yahoo.com “-” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; …)” “-“

64.242.88.10 – – [07/Mar/2004:16:45:56 -0800] “GET /twiki/bin/attach/Main/PostfixCommands HTTP/1.1” 401 12846

64.242.88.10 – – [07/Mar/2004:16:47:12 -0800] “GET /robots.txt HTTP/1.1” 200 68

 

IIS Log Sample (Compare against Checkpoint for structure)

02:49:12 127.0.0.1 GET / 200

02:49:35 127.0.0.1 GET /index.html 200

03:01:06 127.0.0.1 GET /images/sponsered.gif 304

03:52:36 127.0.0.1 GET /search.php 200

04:17:03 127.0.0.1 GET /admin/style.css 200

05:04:54 127.0.0.1 GET /favicon.ico 404

05:38:07 127.0.0.1 GET /js/ads.js 200

 

Checkpoint Firewall Log Samples (Compare against IIS for structure)

Sep 3 15:10:54 192.168.99.1 Checkpoint: 3Sep2007 15:10:52 drop 192.168.99.1 >eth8 rule: 134; rule_uid: {11111111-2222-3333-BD17-711F536C7C33}; dst: 255.255.255.255; proto: udp; product: VPN-1 & FireWall-1; service: 67; s_port: 68; Sep 3 15:11:40 192.168.99.1

Apr 11 11:04:48 hostng Checkpoint: 21Aug2007 12:00:00 accept 10.10.10.2 >eth0 rule: 100; rule_uid: {00000000-0000-0000-0000-000000000000}; service_id: nbdatagram; src: 10.10.10.3; dst: 10.10.10.255; proto: udp; product: VPN-1 & FireWall-1; service: 138; s_port: 138;

Share with Friends
FacebookTwitterLinkedInEmail
Use Cybytes and
Tip the Author!
Join
Share with Friends
FacebookTwitterLinkedInEmail
Ready to share your knowledge and expertise?
9 Comments
  1. Really great idea to collate this info in a nice simple to interpret article. Great post.

  2. This a great article.One thing i will like to know is that does all SIEM tool follow the same process when it comes analysing data. We just started using SPLUNK in my enviroment and the security team wants me to start onboarding all the security log on to the splunk system.

    • Hi Olu,

      I think what you’re asking about is parsing. In other words, do all SIEM systems extract the same fields in the same way from the logs. The answer is generally yes, but as you learn different SIEM systems you will determine that each SIEM has differences and quirks about how it imports data from logs.
      Something to keep in mind as you learn and explore Splunk is that it’s not truly a SIEM. It’s a broad-use data management and analysis tool, with a variety of applications. This is both good and bad, as it has an amazing amount of flexibility and capacity, which can make it a bit overwhelming. I work with Splunk in its capacity as a security data analysis system myself.

      Best of luck!

  3. Great article, but you need to emphasize on the fact that “the more you get logs and interpret them the more you’re able to understand the situation (in SIEM)
    And the log parsing is even more critical and important step than threats detection/intelligence because it is the main phase after log collection and any security analyst needs to parse it as detailed as possible because some times (and believe me it’s always the case) you are going to build your correlation rules based on those fields (it can be a flag in a TCP/UDP sniffed packet)

    • Sorry the reply took so long, Cybrary comments posting was having some sort of php error over the past few days.
      Great points, Wolf Man!
      This article was written to go over logs and their formats and I kind of skimmed over the “Why” a little bit. I have plans for writing more articles about how SIEM works, but maybe you’d like to put a submission together to go over your points here in more detail! I’d like to read that, so definitely send me a note if you do make a submission.

      Thanks for the feedback!

  4. Excellent Article. Thanks.

  5. Nice post – thanks for your efforts.

Comment on This

You must be logged in to post a comment.

Our Revolution

We believe Cyber Security training should be free, for everyone, FOREVER. Everyone, everywhere, deserves the OPPORTUNITY to learn, begin and grow a career in this fascinating field. Therefore, Cybrary is a free community where people, companies and training come together to give everyone the ability to collaborate in an open source way that is revolutionizing the cyber security educational experience.

Support Cybrary

Donate Here to Get This Month's Donor Badge

 

We recommend always using caution when following any link

Are you sure you want to continue?

Continue
Cancel