# Regex Log Parser

## Overview

For text log types with more complex structure, you can use the `regex` parser.

The `regex` parser uses named groups in regular expressions to extract field values from each line of text. You can use grok syntax (i.e. `%{PATTERN_NAME:field_name}`) to build complex expressions taking advantage of the built-in patterns provided by Panther or by defining your own.

{% hint style="warning" %}
Panther's log processor uses the [`RE2` syntax](https://github.com/google/re2/wiki/Syntax) for regular expressions. `RE2` does not support some operations common to other regular expression engines, such as `lookbehind`. Be sure to check any expressions or grok patterns you copy/paste from other systems.
{% endhint %}

For example to match the text

```
2020-10-10T14:32:05 [FOO_SERVICE@127.0.0.1] [DEBUG] "" Something when wrong
```

We can use this grok syntax with this pattern:

```
%{NOTSPACE:timestamp} \[%{WORD:service}@%{DATA:ip}\] \[%{WORD:log_level}\] %{GREEDYDATA:message}
```

Which is the rough equivalent of this 'raw' regular expression:

```
(?P<timestamp>\S+) \[(?P<service>\w+)@(?P<ip>.*?)\] \[(?P<log_level>\w+)\] (?P<message>.*)
```

{% hint style="info" %}
For best performance stick to simple built-in patterns such as `DATA`, `NOTSPACE`, `GREEDYDATA` and `WORD`. Avoid complex expressions unless it is required to distinguish the field name based on the value (e.g. `(%{IP:ip_address}|%{WORD:username})`
{% endhint %}

## Example using regex

Using the `regex` parser we will define a log type for `Juniper.Audit` logs. Panther already [supports these logs natively](https://docs.panther.com/supported-logs/juniper#juniper.audit), but we will be using them here because they have variable conflicting forms and can only be 'solved' by using `regex` parser.

The sample logs for `Juniper.Audit` are:

```
Jan 22 16:14:23 my-jwas [mws-audit][INFO] [mykonos] [10.10.0.117] Logged in successfully
Jan 23 19:16:22 my-jwas [mws-audit][INFO] [ea77722a8516b0d1135abb19b1982852] Deactivate response 1832840420318015488
Feb 7 20:29:51 my-jwas [mws-audit][INFO] [mykonos] [10.10.0.113] Login failed. Attempt: 1
Feb 14 19:02:54 my-jwas [mws-audit][INFO][mykonos] Changed configuration parameters: services.spotlight.enabled, services.spotlight.server_address
```

Here is how we would define a log schema for these logs using `regex`:

{% tabs %}
{% tab title="Console " %}
In the Panther Console, we would follow the [How to create a custom schema manually instructions](https://docs.panther.com/data-onboarding/custom-log-types/..#how-to-create-a-custom-schema-manually), selecting the **Regex** parser.

<figure><img src="https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2Fgit-blob-235e41eb1220881fb86dd387a07c90d56df1153e%2Fimage.png?alt=media" alt="In a &#x22;Schema&#x22; section, &#x22;Regex&#x22; is selected for a Parser field. There are various form fields shown, such as Pattern Definitions, Match Patterns, and Empty Values."><figcaption></figcaption></figure>

In the **Fields & Indicators** section (below the **Parser** section shown in the screenshot above), we would define the fields:

```yaml
fields:
- name: timestamp
  type: timestamp
  required: true
  timeFormats: 
   - '%b %d %H:%M:%S'
  isEventTime: false # the timestamps have no year so we cannot use them as partition time
- name: log_level
  type: string
  required: true
- name: apikey
  type: string
- name: username
  type: string
- name: request_ip
  type: string
  indicators: [ip]
- name: message
  type: string
```

{% endtab %}

{% tab title="Full YAML representation" %}

```yaml
parser:
  regex:
    patternDefinitions:
      JUNIPER_TIMESTAMP: '[A-Z][a-z]{2} \d?\d \d\d:\d\d:\d\d'
      # An apikey is composed of 32 hex characters
      API_KEY: '[a-fA-F0-9]{32}'
    # We will be splitting the pattern in multiple parts so we can add comments helping us debug it in the future.
    # All parts are concatenated into a single pattern by Panther WITHOUT ADDING SPACES BETWEEN PARTS.
    # If you don't want to split your patterns just use an array with a single string.
    match:
    # The log line starts with a timestamp (captured as 'timestamp')
    - '^%{JUNIPER_TIMESTAMP:timestamp}'
    # Followed by this static text
    - ' my-jwas \[mws-audit\]'
    # Then comes the log level surrounded by square brackets and optional space (captured as 'log_level')
    - '\[%{DATA:log_level}\] ?' 
    # After it, we get either an api key or a user name, surrounded by square brackets,
    # which we capture as 'apikey' or 'username' depending on the match
    - '\[(%{API_KEY:apikey}|%{USERNAME:username})\] '
    # Optionally followed by the ip address of the request in square brackets (captured as 'request_ip')
    # Note that we use 'DATA' instead of the specific 'IP' named pattern. 
    # It is not needed because 'request_ip' is always at this position and we are certain of the log type match
    # due to the distinctive ' my-jwas [mws-audit]' literal.
    - '(\[%{DATA:request_ip}\])?'
    # And finally the rest of the line is the message (captured as 'message')
    - '%{GREEDYDATA:message}'
    trimSpace: true # We want to trim the space of the message
fields:
- name: timestamp
  type: timestamp
  required: true
  timeFormats: 
   - '%b %d %H:%M:%S'
  isEventTime: false # the timestamps have no year so we cannot use them as partition time
- name: log_level
  type: string
  required: true
- name: apikey
  type: string
- name: username
  type: string
- name: request_ip
  type: string
  indicators: [ip]
- name: message
  type: string
```

{% endtab %}
{% endtabs %}

## Built-in regex pattern reference

The following tables detail the built-in Panther regex patterns you can use.

### General

<table><thead><tr><th width="173">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>DATA</code></td><td><code>.*?</code></td></tr><tr><td><code>GREEDYDATA</code></td><td><code>.*</code></td></tr><tr><td><code>NOTSPACE</code></td><td><code>\S+</code></td></tr><tr><td><code>SPACE</code></td><td><code>\s*</code></td></tr><tr><td><code>WORD</code></td><td><code>\b\w+\b</code></td></tr><tr><td><code>QUOTEDSTRING</code></td><td><code>"(?:\.|[^\"]+)+"|""|'(?:\.|[^\']+)+'|''</code></td></tr><tr><td><code>HEXDIGIT</code></td><td><code>[0-9a-fAF]</code></td></tr><tr><td><code>UUID</code></td><td><code>%{HEXDIGIT}{8}-(?:%{HEXDIGIT}{4}-){3}%{HEXDIGIT}{12}</code></td></tr></tbody></table>

### Numbers

<table><thead><tr><th width="153">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>INT</code></td><td><code>[+-]?(?:[0-9]+)</code></td></tr><tr><td><code>BASE10NUM</code></td><td><code>[+-]?(?:[0-9]+(?:.[0-9]+)?)|.[0-9]+</code></td></tr><tr><td><code>NUMBER</code></td><td><code>%{BASE10NUM}</code></td></tr><tr><td><code>BASE16NUM</code></td><td><code>(?:0[xX])?%{HEXDIGIT}+</code></td></tr><tr><td><code>POSINT</code></td><td><code>\b[1-9][0-9]*\b</code></td></tr><tr><td><code>NONNEGINT</code></td><td><code>\b[0-9]+\b</code></td></tr></tbody></table>

### Network

<table><thead><tr><th width="154">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>CISCOMAC</code></td><td><code>(?:[A-Fa-f0-9]{4}.){2}[A-Fa-f0-9]{4}</code></td></tr><tr><td><code>WINDOWSMAC</code></td><td><code>(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}</code></td></tr><tr><td><code>COMMONMAC</code></td><td><code>(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}</code></td></tr><tr><td><code>MAC</code></td><td><code>%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}</code></td></tr><tr><td><code>IPV6</code></td><td><code>\b(?:(?:(?:%{HEXDIGIT}{1,4}:){7}(?:%{HEXDIGIT}{1,4}|:))|(?:(?:%{HEXDIGIT}{1,4}:){6}(?::%{HEXDIGIT}{1,4}|(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(?:(?:%{HEXDIGIT}{1,4}:){5}(?:(?:(?::%{HEXDIGIT}{1,4}){1,2})|:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|((%{HEXDIGIT}{1,4}:){4}(((:%{HEXDIGIT}{1,4}){1,3})|((:%{HEXDIGIT}{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){3}(((:%{HEXDIGIT}{1,4}){1,4})|((:%{HEXDIGIT}{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){2}(((:%{HEXDIGIT}{1,4}){1,5})|((:%{HEXDIGIT}{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){1}(((:%{HEXDIGIT}{1,4}){1,6})|((:%{HEXDIGIT}{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:%{HEXDIGIT}{1,4}){1,7})|((:%{HEXDIGIT}{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\b</code></td></tr><tr><td><code>IPV4INT</code></td><td><code>25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]</code></td></tr><tr><td><code>IPV4</code></td><td><code>\b(?:(?:%{IPV4INT}).){3}(?:%{IPV4INT})\b</code></td></tr><tr><td><code>IP</code></td><td><code>%{IPV6}|%{IPV4}</code></td></tr><tr><td><code>HOSTNAME</code></td><td><code>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)</code></td></tr><tr><td><code>IPORHOST</code></td><td><code>%{IP}|%{HOSTNAME}</code></td></tr><tr><td><code>HOSTPORT</code></td><td><code>%{IPORHOST}:%{POSINT}</code></td></tr></tbody></table>

### URI

<table><thead><tr><th width="177">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>USERNAME</code></td><td><code>[a-zA-Z0-9._-]+</code></td></tr><tr><td><code>UNIXPATH</code></td><td><code>(?:/[\w_%!$@:.,-]?/?)(\S+)?</code></td></tr><tr><td><code>WINPATH</code></td><td><code>(?:[A-Za-z]:|\)(?:\[^\?])+</code></td></tr><tr><td><code>PATH</code></td><td><code>(?:%{UNIXPATH}|%{WINPATH})</code></td></tr><tr><td><code>TTY</code></td><td><code>(?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))</code></td></tr><tr><td><code>URIPROTO</code></td><td><code>[A-Za-z]+(?:+[A-Za-z+]+)?</code></td></tr><tr><td><code>URIHOST</code></td><td><code>%{IPORHOST}(?::%{POSINT})?</code></td></tr><tr><td><code>URIPATH</code></td><td><code>(?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_-]*)+</code></td></tr><tr><td><code>URIPARAM</code></td><td><code>?[A-Za-z0-9$.+!*'|(){},~@#%&#x26;/=:;_?-[]&#x3C;>]*</code></td></tr><tr><td><code>URIPATHPARAM</code></td><td><code>%{URIPATH}(?:%{URIPARAM})?</code></td></tr><tr><td><code>URI</code></td><td><code>%{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?</code></td></tr></tbody></table>

### Timestamps

<table><thead><tr><th width="236">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>MONTH</code></td><td><code>\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b MONTHNUM 0?[1-9]|1[0-2]</code></td></tr><tr><td><code>MONTHNUM</code></td><td><code>0?[1-9]|1[0-2]</code></td></tr><tr><td><code>MONTHNUM2</code></td><td><code>0[1-9]|1[0-2]</code></td></tr><tr><td><code>MONTHDAY</code></td><td><code>(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]</code></td></tr><tr><td><code>DAY</code></td><td><code>\b(?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)\b</code></td></tr><tr><td><code>YEAR</code></td><td><code>(?:\d\d){1,2}</code></td></tr><tr><td><code>HOUR</code></td><td><code>2[0123]|[01]?[0-9]</code></td></tr><tr><td><code>MINUTE</code></td><td><code>[0-5][0-9]</code></td></tr><tr><td><code>SECOND</code></td><td><code>(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?</code></td></tr><tr><td><code>KITCHEN</code></td><td><code>%{HOUR}:%{MINUTE}</code></td></tr><tr><td><code>TIME</code></td><td><code>%{HOUR}:%{MINUTE}:%{SECOND}</code></td></tr><tr><td><code>DATE_US</code></td><td><code>%{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}</code></td></tr><tr><td><code>DATE_EU</code></td><td><code>%{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}</code></td></tr><tr><td><code>ISO8601_TIMEZONE</code></td><td><code>(?:Z|[+-]%{HOUR}(?::?%{MINUTE}))</code></td></tr><tr><td><code>ISO8601_SECOND</code></td><td><code>(?:%{SECOND}|60)</code></td></tr><tr><td><code>TIMESTAMP_ISO8601</code></td><td><code>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?</code></td></tr><tr><td><code>DATE</code></td><td><code>%{DATE_US}|%{DATE_EU}</code></td></tr><tr><td><code>DATETIME</code></td><td><code>%{DATE}[- ]%{TIME}</code></td></tr><tr><td><code>TZ</code></td><td><code>[A-Z]{3}</code></td></tr><tr><td><code>TZOFFSET</code></td><td><code>[+-]\d{4}</code></td></tr><tr><td><code>TIMESTAMP_RFC822</code></td><td><code>%{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}</code></td></tr><tr><td><code>TIMESTAMP_RFC2822</code></td><td><code>%{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}</code></td></tr><tr><td><code>TIMESTAMP_OTHER</code></td><td><code>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}</code></td></tr><tr><td><code>TIMESTAMP_EVENTLOG</code></td><td><code>%{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}</code></td></tr><tr><td><code>SYSLOGTIMESTAMP</code></td><td><code>%{MONTH} +%{MONTHDAY} %{TIME}</code></td></tr><tr><td><code>HTTPDATE</code></td><td><code>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{TZOFFSET}</code></td></tr></tbody></table>

### Aliases

<table><thead><tr><th width="134">Name</th><th>Equivalent To</th></tr></thead><tbody><tr><td><code>NS</code></td><td><code>NOTSPACE</code></td></tr><tr><td><code>QS</code></td><td><code>QUOTEDSTRING</code></td></tr><tr><td><code>HOST</code></td><td><code>HOSTNAME</code></td></tr><tr><td><code>PID</code></td><td><code>POSINT</code></td></tr><tr><td><code>USER</code></td><td><code>USERNAME</code></td></tr></tbody></table>
