CSV Log Parser

Overview

Using the csv log parser, the process of parsing files in CSV format is based on converting each row into a simple JSON object mapping keys to values. To do that, each column must be given a name.

CSV logs without header

To parse CSV logs without a header row, Panther needs to know which names to assign to each column.

Let's assume our logs are CSV with 7 columns: year, month, day, time, action, ip_address, message. Some example rows of this file could be:

# Access logs for 20200901
2020,09,01,10:35:23, SEND ,192.168.1.3,"PING"
2020,09,01,10:35:25, RECV ,192.168.1.3,"PONG"
2020,09,01,10:35:25, RESTART ,-,"System restarts"

We would use the following LogSchema to define log type:

In the Panther Console, we would follow the How to create a custom schema manually instructions, selecting the CSV parser.

In the Fields & Indicators section (below the Parser section shown in the screenshot above), we would define the fields:

fields:
- name: timestamp
  type: timestamp
  timeFormats: 
   - rfc3339
  isEventTime: true
  required: true
- name: action
  type: string
  required: true
- name: ip_address
  type: string
  indicators: [ip]
- name: message
  type: string

CSV logs with header

Avoid using such schemas in combination with others. Use a separate source or S3 prefix.

To parse CSV logs that starts with a header row, Panther has two options:

  • Use the names defined in the header as the names for the JSON fields or,

  • Skip the header and define the names the same way we did for headerless CSV files

To use the names in the header the configuration for the parser should be:

parser:
  csv:
    delimiter: "," 
    # Setting 'hasHeader' to true without specifying a 'columns' field,
    # tells Panther to set the column names from values in the header.
    hasHeader: true
    # In case you want to rename a column you can use the 'expandFields' directive
    expandFields:
      # Let's assume that the header contains '$cost' as column name and you want to 'normalize' it as 'cost_us_dollars'
      "cost_us_dollars": '%{$cost}'

To ignore the header and define your set of names for the columns use:

parser:
  csv:
    delimiter: "," 
    # Setting 'hasHeader' to true while also specifying a 'columns' field, 
    # tells Panther to ignore the header and use the names in the 'columns' array
    hasHeader: true
    columns:
    - foo
    - bar
    - baz

Last updated