# Script Log Parser (Beta)

## Overview

{% hint style="info" %}
The script log parser is in open beta starting with Panther version 1.108, and is available to all customers. Please share any bug reports and feature requests with your Panther support team.
{% endhint %}

`script` is one of the possible values of the [`parser` key](https://docs.panther.com/~/changes/2402/data-onboarding/reference#parserspec) in a custom log schema. This parser lets you specify the transformations Panther should perform on each incoming log event using the [Starlark configuration language](https://bazel.build/rules/language), which shares many syntax similarities with Python. The `script` parser in Panther can handle both structured (JSON) and unstructured events.

You might benefit from using the `script` parser when you'd like to:

* Parse unstructured logs, but the other parser options ([`csv`](https://docs.panther.com/~/changes/2402/data-onboarding/custom-log-types/csv-parser), [`fastmatch`](https://docs.panther.com/~/changes/2402/data-onboarding/custom-log-types/fastmatch-parser), [`regex`](https://docs.panther.com/~/changes/2402/data-onboarding/custom-log-types/regex-parser)) are insufficient
* Perform transformations on the data, but the Panther-provided [schema transformations](https://docs.panther.com/~/changes/2402/data-onboarding/custom-log-types/transformations) are insufficient

## Understanding the `script` parser

### Defining a `function`

When using the `script` parser, you must implement a Starlark `function`. The function takes in a [string](https://github.com/google/starlark-go/blob/master/doc/spec.md#strings) and must return a non-empty [dictionary](https://github.com/google/starlark-go/blob/master/doc/spec.md#dictionaries). The returned dictionary defines the format of the output event.

### Available functions

The `script` parser can use any of the primitives described in the [Starlark specification](https://github.com/google/starlark-go/blob/master/doc/spec.md). However, it is important to note that:

* Raising exceptions is not allowed.
* Imports are not allowed.

### Handling JSON

While `script` is mainly intended to be used for text logs, it can also be used for JSON logs in cases where you want to perform transformations outside of [the ones that are natively supported by Panther](https://docs.panther.com/~/changes/2402/data-onboarding/custom-log-types/transformations). For this reason, the `script` parser comes pre-loaded with a `json` module that allows you to convert JSON from type string to dictionary.

For example, the following configuration will create a new field called `is_panther_employee` that will be `true` if the actor email has the `panther.com` domain, and `false` otherwise.

```yaml
parser:
  script:
    function: |
      def parse(log):
        event = json.decode(log)
        if event['actor']['email'].endswith('@panther.com'):
          event['is_panther_employee'] = True
        else:
          event['is_panther_employee'] = False
        return event
```

For ease of understanding, the above `parse` function is shown below with Python syntax highlighting:

```python
def parse(log):
  event = json.decode(log)
  if event['actor']['email'].endswith('@panther.com'):
    event['is_panther_employee'] = True
  else:
    event['is_panther_employee'] = False
  return event
```

## Example using `script`

Imagine the following log line, using the Apache Common Log format, is sent to Panther:

<pre><code><strong>127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
</strong></code></pre>

To parse this log type using `script`, we'll define the following function:

```python
def parse(log):
  fields = log.split(" ")
  return {
    'remote_ip': fields[0],
    'identity': fields[1],
    'user': fields[2],
    'timestamp': ' '.join(fields[3:5]).strip('[]'),
    'request_uri': ' '.join(fields[5:8]).strip('"'),
    "status": int(fields[8]),
    "bytes_sent": int(fields[9])
  }
```

And use the following schema fields:

```yaml
fields:
  - name: remote_ip
    type: string
    indicators:
      - ip
  - name: identity
    type: string
  - name: user
    type: string
  - name: timestamp
    type: timestamp
    isEventTime: true
    timeFormats:
     - '%d/%b/%Y:%H:%M:%S %z'
  - name: method
    type: string
  - name: request_uri
    type: string
  - name: protocol
    type: string
  - name: status
    type: int
  - name: bytes_sent
    type: bigint
```

After the log above is normalized with this parser, it becomes:

```json
{
    "bytes_sent":2326,
    "identity": "-",
    "method":"GET",
    "protocol":"HTTP/1.0",
    "remote_ip":"127.0.0.1",
    "request_uri":"/apache_pb.gif",
    "status":200,
    "timestamp":"2000-10-10 20:55:36.000000000",
    "user":"frank"
}
```
