The process of parsing files in CSV format is based on converting each row into a simple JSON object mapping keys to values. To do that, each column must be given a name.
CSV logs without header
To parse CSV logs without a header row, Panther needs to know which names to assign to each column.
Let's assume our logs are CSV with 7 columns: year, month, day, time, action, ip_address, message. Some example rows of this file could be:
We would use the following LogSchema to define log type:
parser:csv:# CSV files come in many flavors and you can choose the delimiter character to split each rowdelimiter:","# Names in the 'columns' array will be mapped to columns in each row.# If you want to skip a column, you can set the name at the same index to an empty string ("")columns: - year - month - day - time - action - ip_address - message# CSV files sometimes use placeholder values for missing or N/A data.# You can define such values with 'emptyValues' and they will be ignored.emptyValues: ["-"]# The 'expandFields' directive will render a template string injecting generated fields into the key/value pairsexpandFields:# Since the timestamp is split across multiple columns, we need to re-assemble it into RFC3339 format# The following will add a 'timestamp' field by replacing the fields from CSV valuestimestamp:'%{year}-%{month}-%{day}T%{time}Z'fields:- name:timestamptype:timestamptimeFormats: - rfc3339isEventTime:truerequired:true- name:actiontype:stringrequired:true- name:ip_addresstype:stringindicators: [ip]- name:messagetype:string
CSV logs with header
Avoid using such schemas in combination with others. Use a separate source or S3 prefix.
To parse CSV logs that starts with a header row, Panther has two options:
Use the names defined in the header as the names for the JSON fields or,
Skip the header and define the names the same way we did for headerless CSV files
To use the names in the header the configuration for the parser should be:
parser:csv:delimiter:","# Setting 'hasHeader' to true without specifying a 'columns' field,# tells Panther to set the column names from values in the header.hasHeader:true# In case you want to rename a column you can use the 'expandFields' directiveexpandFields: # Let's assume that the header contains '$cost' as column name and you want to 'normalize' it as 'cost_us_dollars'
"cost_us_dollars":'%{$cost}'
To ignore the header and define your set of names for the columns use:
parser:csv:delimiter:","# Setting 'hasHeader' to true while also specifying a 'columns' field, # tells Panther to ignore the header and use the names in the 'columns' arrayhasHeader:truecolumns: - foo - bar - baz