# CSV Log Parser

## Overview

Using the `csv` log parser, CSV files are parsed by converting each row into a simple JSON object, mapping keys to values. To do this, each column must be given a name.

## CSV logs without header

To parse CSV logs without a header row, Panther needs to know which names to assign to each column.

Let's assume our logs are CSV with 7 columns: year, month, day, time, action, ip\_address, message. Some example rows of this file could be:

```
# Access logs for 20200901
2020,09,01,10:35:23, SEND ,192.168.1.3,"PING"
2020,09,01,10:35:25, RECV ,192.168.1.3,"PONG"
2020,09,01,10:35:25, RESTART ,-,"System restarts"
```

We would use the following *LogSchema* to define log type:

{% tabs %}
{% tab title="Console " %}
In the Panther Console, we would follow the [How to create a custom schema manually instructions](https://docs.panther.com/data-onboarding/custom-log-types/..#how-to-create-a-custom-schema-manually), selecting the **CSV** parser.

<figure><img src="https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2Fgit-blob-b5f84896b7f48030c46ade5b0f02b48115d148ce%2Fimage.png?alt=media" alt="Under a &#x22;Parser&#x22; header is a form with various fields, including &#x22;Has Header,&#x22; &#x22;Delimiter,&#x22; &#x22;Columns,&#x22; &#x22;Empty Values,&#x22; and more."><figcaption></figcaption></figure>

In the **Fields & Indicators** section (below the **Parser** section shown in the screenshot above), we would define the fields:

```yaml
fields:
- name: timestamp
  type: timestamp
  timeFormats: 
   - rfc3339
  isEventTime: true
  required: true
- name: action
  type: string
  required: true
- name: ip_address
  type: string
  indicators: [ip]
- name: message
  type: string
```

{% endtab %}

{% tab title="Full YAML representation" %}

```yaml
parser:
  csv:
    # CSV files come in many flavors and you can choose the delimiter character to split each row
    delimiter: ","
    # Trim space surrounding values
    trimSpace: true
    # Names in the 'columns' array will be mapped to columns in each row.
    # If you want to skip a column, you can set the name at the same index to an empty string ("")
    columns:
    - year
    - month
    - day
    - time
    - action
    - ip_address
    - message
    # Skip lines that start with the prefix (e.g. comments)
    skipPrefix: "# "
    # CSV files sometimes use placeholder values for missing or N/A data.
    # You can define such values with 'emptyValues' and they will be ignored.
    emptyValues: ["-"]
    # The 'expandFields' directive will render a template string injecting generated fields into the key/value pairs
    expandFields:
      # Since the timestamp is split across multiple columns, we need to re-assemble it into RFC3339 format
      # The following will add a 'timestamp' field by replacing the fields from CSV values
      timestamp: '%{year}-%{month}-%{day}T%{time}Z'
fields:
- name: timestamp
  type: timestamp
  timeFormats: 
   - rfc3339
  isEventTime: true
  required: true
- name: action
  type: string
  required: true
- name: ip_address
  type: string
  indicators: [ip]
- name: message
  type: string
```

{% endtab %}
{% endtabs %}

## CSV logs with header

To parse CSV files that start with a header row, there are two options:

* Use the names defined in the header as the names for the JSON fields
* Disregard the header and define column names explicitly the [same way you would for header-less CSV files](#csv-logs-without-header)

{% hint style="warning" %}
If your CSV files have a header and you do not explicitly define columns (letting them instead by defined by the header), do not combine this schema in the same log source (or single S3 prefix) with other schemas. Doing so could cause logs to be improperly classified.
{% endhint %}

To use the names in the header the configuration for the parser should be:

```yaml
parser:
  csv:
    delimiter: "," 
    # Setting 'hasHeader' to true without specifying a 'columns' field,
    # tells Panther to set the column names from values in the header.
    hasHeader: true
    # In case you want to rename a column you can use the 'expandFields' directive
    expandFields:
      # Let's assume that the header contains '$cost' as column name and you want to 'normalize' it as 'cost_us_dollars'
      "cost_us_dollars": '%{$cost}'
```

To ignore the header and define your set of names for the columns use:

```yaml
parser:
  csv:
    delimiter: "," 
    # Setting 'hasHeader' to true while also specifying a 'columns' field, 
    # tells Panther to ignore the header and use the names in the 'columns' array
    hasHeader: true
    columns:
    - foo
    - bar
    - baz
```
