Script Log Parser

Parse incoming logs with script defined in Starlark configuration language

Overview

The script log parser is in open beta starting with Panther version 1.108, and is available to all customers. Please share any bug reports and feature requests with your Panther support team.

script is one of the possible values of the parser key in a custom log schema. This parser lets you specify the transformations Panther should perform on each incoming log event using the Starlark configuration language, which shares many syntax similarities with Python. The script parser in Panther can handle both structured (JSON) and unstructured events.

You might benefit from using the script parser when you'd like to:

  • Parse unstructured logs, but the other parser options (csv, fastmatch, regex) are insufficient

  • Perform transformations on the data, but the Panther-provided schema transformations are insufficient

Understanding the script parser

Defining a function

When using the script parser, you must implement a Starlark function. The function takes in a string and must return a non-empty dictionary. The returned dictionary defines the format of the output event.

Available functions

The script parser can use any of the primitives described in the Starlark specification. However, it is important to note that:

  • Raising exceptions is not allowed.

  • Imports are not allowed.

Handling JSON

While script is mainly intended to be used for text logs, it can also be used for JSON logs in cases where you want to perform transformations outside of the ones that are natively supported by Panther. For this reason, the script parser comes pre-loaded with a json module that allows you to convert JSON from type string to dictionary.

For example, the following configuration will create a new field called is_panther_employee that will be true if the actor email has the panther.com domain, and false otherwise.

parser:
  script:
    function: |
      def parse(log):
        event = json.decode(log)
        if event['actor']['email'].endswith('@panther.com'):
          event['is_panther_employee'] = True
        else:
          event['is_panther_employee'] = False
        return event

For ease of understanding, the above parse function is shown below with Python syntax highlighting:

def parse(log):
  event = json.decode(log)
  if event['actor']['email'].endswith('@panther.com'):
    event['is_panther_employee'] = True
  else:
    event['is_panther_employee'] = False
  return event

Example using script

Imagine the following log line, using the Apache Common Log format, is sent to Panther:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

To parse this log type using script, we'll define the following function:

def parse(log):
  fields = log.split(" ")
  return {
    'remote_ip': fields[0],
    'identity': fields[1],
    'user': fields[2],
    'timestamp': ' '.join(fields[3:5]).strip('[]'),
    'request_uri': ' '.join(fields[5:8]).strip('"'),
    "status": int(fields[8]),
    "bytes_sent": int(fields[9])
  }

And use the following schema fields:

fields:
  - name: remote_ip
    type: string
    indicators:
      - ip
  - name: identity
    type: string
  - name: user
    type: string
  - name: timestamp
    type: timestamp
    isEventTime: true
    timeFormats:
     - '%d/%b/%Y:%H:%M:%S %z'
  - name: method
    type: string
  - name: request_uri
    type: string
  - name: protocol
    type: string
  - name: status
    type: int
  - name: bytes_sent
    type: bigint

After the log above is normalized with this parser, it becomes:

{
    "bytes_sent":2326,
    "identity": "-",
    "method":"GET",
    "protocol":"HTTP/1.0",
    "remote_ip":"127.0.0.1",
    "request_uri":"/apache_pb.gif",
    "status":200,
    "timestamp":"2000-10-10 20:55:36.000000000",
    "user":"frank"
}

Last updated