# Script Log Parser

## Overview

`script` is one of the possible values of the [`parser` key](https://docs.panther.com/data-onboarding/reference#parserspec) in a custom log schema. This parser lets you specify the transformations Panther should perform on each incoming log event using the [Starlark configuration language](https://bazel.build/rules/language), which shares many syntax similarities with Python. The `script` parser in Panther can handle both structured (JSON) and unstructured events.

You might benefit from using the `script` parser when you'd like to:

* Parse unstructured logs, but the other parser options ([`csv`](https://docs.panther.com/data-onboarding/custom-log-types/csv-parser), [`fastmatch`](https://docs.panther.com/data-onboarding/custom-log-types/fastmatch-parser), [`regex`](https://docs.panther.com/data-onboarding/custom-log-types/regex-parser)) are insufficient
* Perform transformations on the data, but the Panther-provided [schema transformations](https://docs.panther.com/data-onboarding/custom-log-types/transformations) are insufficient

## Understanding the `script` parser

### Defining a `function`

When using the `script` parser, you must implement a Starlark `function`. The function takes in a [string](https://github.com/google/starlark-go/blob/master/doc/spec.md#strings) and must return a non-empty [dictionary](https://github.com/google/starlark-go/blob/master/doc/spec.md#dictionaries). The returned dictionary defines the format of the output event.

### Available functions

The `script` parser can use any of the primitives described in the [Starlark specification](https://github.com/google/starlark-go/blob/master/doc/spec.md). Additionally, you can use the following functions:

<table><thead><tr><th width="177.61370849609375">Function name</th><th>Description</th></tr></thead><tbody><tr><td>json.decode</td><td>Decodes a JSON string to a dictionary</td></tr><tr><td>json.encode</td><td>Encodes a dictionary to a JSON string</td></tr><tr><td>base64.decode</td><td>Decodes a base64-encoded string</td></tr><tr><td>base64.encode</td><td>Performs base64 encoding on a string</td></tr></tbody></table>

### Restrictions

The following restrictions apply to your script:

* Raising exceptions is not allowed.
* Imports are not allowed.

### Handling JSON

While `script` is mainly intended to be used for text logs, it can also be used for JSON logs in cases where you want to perform transformations outside of [the ones that are natively supported by Panther](https://docs.panther.com/data-onboarding/custom-log-types/transformations). For this reason, the `script` parser comes pre-loaded with a `json` module that allows you to convert JSON from type string to dictionary.

For example, the following configuration will create a new field called `is_panther_employee` that will be `true` if the actor email has the `panther.com` domain, and `false` otherwise.

```yaml
parser:
  script:
    function: |
      def parse(log):
        event = json.decode(log)
        if event['actor']['email'].endswith('@panther.com'):
          event['is_panther_employee'] = True
        else:
          event['is_panther_employee'] = False
        return event
```

For ease of understanding, the above `parse` function is shown below with Python syntax highlighting:

```python
def parse(log):
  event = json.decode(log)
  if event['actor']['email'].endswith('@panther.com'):
    event['is_panther_employee'] = True
  else:
    event['is_panther_employee'] = False
  return event
```

## Example using `script`

Imagine the following log line, using the Apache Common Log format, is sent to Panther:

<pre><code><strong>127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
</strong></code></pre>

To parse this log type using `script`, we'll define the following function:

```python
def parse(log):
  fields = log.split(" ")
  return {
    'remote_ip': fields[0],
    'identity': fields[1],
    'user': fields[2],
    'timestamp': ' '.join(fields[3:5]).strip('[]'),
    'request_uri': ' '.join(fields[5:8]).strip('"'),
    "status": int(fields[8]),
    "bytes_sent": int(fields[9])
  }
```

And use the following schema fields:

```yaml
fields:
  - name: remote_ip
    type: string
    indicators:
      - ip
  - name: identity
    type: string
  - name: user
    type: string
  - name: timestamp
    type: timestamp
    isEventTime: true
    timeFormats:
     - '%d/%b/%Y:%H:%M:%S %z'
  - name: method
    type: string
  - name: request_uri
    type: string
  - name: protocol
    type: string
  - name: status
    type: int
  - name: bytes_sent
    type: bigint
```

After the log above is normalized with this parser, it becomes:

```json
{
    "bytes_sent":2326,
    "identity": "-",
    "method":"GET",
    "protocol":"HTTP/1.0",
    "remote_ip":"127.0.0.1",
    "request_uri":"/apache_pb.gif",
    "status":200,
    "timestamp":"2000-10-10 20:55:36.000000000",
    "user":"frank"
}
```
