Log Analysis • 0xnhl

Data sources such as devices generate data in the form of events. A log is a record of events that occur within an organization’s systems. Logs contain log entries and each entry details information corresponding to a single event that happened on a device or system.

Log analysis is the process of examining logs to identify events of interest
Logs help uncover the details surrounding the 5 W’s of incident investigation: who triggered the incident, what happened, when the incident took place, where the incident took place, and why the incident occurred.

Types of logs#

Depending on the data source, different log types can be produced. Here’s a list of some common log types that organizations should record:

Network: Network logs are generated by network devices like firewalls, routers, or switches.
System: System logs are generated by operating systems like Chrome OS™, Windows, Linux, or macOS®.
Application: Application logs are generated by software applications and contain information relating to the events occurring within the application such as a smartphone app.
Security: Security logs are generated by various devices or systems such as antivirus software and intrusion detection systems. Security logs contain security-related information such as file deletion.
Authentication: Authentication logs are generated whenever authentication occurs such as a successful login attempt into a computer.

Log management#

Because all devices produce logs, it can quickly become overwhelming for organizations to keep track of all the logs that are generated. To get the most value from your logs, you need to choose exactly what to log, how to access it easily, and keep it secure using log management. Log management is the process of collecting, storing, analyzing, and disposing of log data.

What to log#

The most important aspect of log management is choosing what to log. Organizations are different, and their logging requirements can differ too. It’s important to consider which log sources are most likely to contain the most useful information depending on your event of interest. This might be configuring log sources to reduce the amount of data they record, such as excluding excessive verbosity. Some information, including but not limited to phone numbers, email addresses, and names, form personally identifiable information (PII), which requires special handling and in some jurisdictions might not be possible to be logged.

The issue with overlogging#

From a security perspective, it can be tempting to log everything. This is the most common mistake organizations make. Just because it can be logged, doesn’t mean it needs to be logged. Storing excessive amounts of logs can have many disadvantages with some SIEM tools. For example, overlogging can increase storage and maintenance costs. Additionally, overlogging can increase the load on systems, which can cause performance issues and affect usability, making it difficult to search for and identify important events.

Log retention#

Organizations might operate in industries with regulatory requirements. For example, some regulations require organizations to retain logs for set periods of time and organizations can implement log retention practices in their log management policy.

Log protection#

Along with management and retention, the protection of logs is vital in maintaining log integrity. It’s not unusual for malicious actors to modify logs in attempts to mislead security teams and to even hide their activity.
Storing logs in a centralized log server is a way to maintain log integrity. When logs are generated, they get sent to a dedicated server instead of getting stored on a local machine. This makes it more difficult for attackers to access logs because there is a barrier between the attacker and the log location.

Commonly used log formats: Syslog, JSON, XML, CSV, CEF

Syslog#

Syslog is a standard for logging and transmitting data. It can be used to refer to any of its three different capabilities:

Protocol: The syslog protocol is used to transport logs to a centralized log server for log management. It uses port 514 for plaintext logs and port 6514 for encrypted logs.
Service: The syslog service acts as a log forwarding service that consolidates logs from multiple sources into a single location. The service works by receiving and then forwarding any syslog log entries to a remote server.
Log format: The syslog log format is one of the most commonly used log formats that you will be focusing on. It is the native logging format used in Unix® systems. It consists of three components: a header, structured-data, and a message.

Syslog log example#

Here is an example of a syslog entry that contains all three components: a header, followed by structured-data, and a message:
<236>1 2022-03-21T01:11:11.003Z virtual.machine.com evntslog - ID01 [user@32473 iut="1" eventSource="Application" eventID="9999"] This is a log entry!

Header
The header contains details like the timestamp; the hostname, which is the name of the machine that sends the log; the application name; and the message ID.
- Timestamp: The timestamp in this example is 2022-03-21T01:11:11.003Z, where 2022-03-21 is the date in YYYY-MM-DD format. T is used to separate the date and the time. 01:11:11.003 is the 24-hour format of the time and includes the number of milliseconds 003. Z indicates the timezone, which is Coordinated Universal Time (UTC).
- Hostname: virtual.machine.com
- Application: evntslog
- Message ID: ID01
Structured-data
The structured-data portion of the log entry contains additional logging information. This information is enclosed in square brackets and structured in key-value pairs. Here, there are three keys with corresponding values: [user@32473 iut=“1” eventSource=“Application” eventID=“9999”].
Message
The message contains a detailed log message about the event. Here, the message is This is a log entry!.
Priority (PRI)
The priority (PRI) field indicates the urgency of the logged event and is contained with angle brackets. In this example, the priority value is <236> . Generally, the lower the priority level, the more urgent the event is.

CEF (Common Event Format)#

Common Event Format (CEF) is a log format that uses key-value pairs to structure data and identify fields and their corresponding values. The CEF syntax is defined as containing the following fields:

cef
CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension

plaintext

Fields are all separated with a pipe character |. However, anything in the Extension part of the CEF log entry must be written in a key-value format. Syslog is a common method used to transport logs like CEF. When Syslog is used a timestamp and hostname will be prepended to the CEF message. Here is an example of a CEF log entry that details malicious activity relating to a worm infection:

Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.2 dst=2.1.2.2 spt=1232

plaintext

Here is a breakdown of the fields:

Syslog Timestamp: Sep 29 08:26:10
Syslog Hostname: host
Version: CEF:1
Device Vendor: Security
Device Product: threatmanager
Device Version: 1.0
Signature ID: 100
Name: worm successfully stopped
Severity: 10
Extension: This field contains data written as key-value pairs. There are two IP addresses, src=10.0.0.2 and dst=2.1.2.2, and a source port number spt=1232. Extensions are not required and are optional to add.

This log entry contains details about a Security application called threatmanager that successfully stopped a worm from spreading from the internal network at 10.0.0.2 to the external network 2.1.2.2 through the port 1232. A high severity level of 10 is reported.