# Regex Log Parser

## Overview

For text log types with more complex structure, you can use the `regex` parser.

The `regex` parser uses named groups in regular expressions to extract field values from each line of text. You can use grok syntax (i.e. `%{PATTERN_NAME:field_name}`) to build complex expressions taking advantage of the built-in patterns provided by Panther or by defining your own.

{% hint style="warning" %}
Panther's log processor uses the [`RE2` syntax](https://github.com/google/re2/wiki/Syntax) for regular expressions. `RE2` does not support some operations common to other regular expression engines, such as `lookbehind`. Be sure to check any expressions or grok patterns you copy/paste from other systems.
{% endhint %}

For example to match the text

```
2020-10-10T14:32:05 [FOO_SERVICE@127.0.0.1] [DEBUG] "" Something when wrong
```

We can use this grok syntax with this pattern:

```
%{NOTSPACE:timestamp} \[%{WORD:service}@%{DATA:ip}\] \[%{WORD:log_level}\] %{GREEDYDATA:message}
```

Which is the rough equivalent of this 'raw' regular expression:

```
(?P<timestamp>\S+) \[(?P<service>\w+)@(?P<ip>.*?)\] \[(?P<log_level>\w+)\] (?P<message>.*)
```

{% hint style="info" %}
For best performance stick to simple built-in patterns such as `DATA`, `NOTSPACE`, `GREEDYDATA` and `WORD`. Avoid complex expressions unless it is required to distinguish the field name based on the value (e.g. `(%{IP:ip_address}|%{WORD:username})`
{% endhint %}

## Example using regex

Using the `regex` parser we will define a log type for `Juniper.Audit` logs. Panther already [supports these logs natively](https://docs.panther.com/data-onboarding/custom-log-types/pages/-MXJ6kVgmG-A7Sg_EzVh#juniper.audit), but we will be using them here because they have variable conflicting forms and can only be 'solved' by using `regex` parser.

The sample logs for `Juniper.Audit` are:

```
Jan 22 16:14:23 my-jwas [mws-audit][INFO] [mykonos] [10.10.0.117] Logged in successfully
Jan 23 19:16:22 my-jwas [mws-audit][INFO] [ea77722a8516b0d1135abb19b1982852] Deactivate response 1832840420318015488
Feb 7 20:29:51 my-jwas [mws-audit][INFO] [mykonos] [10.10.0.113] Login failed. Attempt: 1
Feb 14 19:02:54 my-jwas [mws-audit][INFO][mykonos] Changed configuration parameters: services.spotlight.enabled, services.spotlight.server_address
```

Here is how we would define a log schema for these logs using `regex`:

{% tabs %}
{% tab title="Console " %}
In the Panther Console, we would follow the [How to create a custom schema manually instructions](/data-onboarding/custom-log-types.md#how-to-create-a-custom-schema-manually), selecting the **Regex** parser.

<figure><img src="/files/UpKUb9KEArwsJw62wa7c" alt="In a &#x22;Schema&#x22; section, &#x22;Regex&#x22; is selected for a Parser field. There are various form fields shown, such as Pattern Definitions, Match Patterns, and Empty Values."><figcaption></figcaption></figure>

In the **Fields & Indicators** section (below the **Parser** section shown in the screenshot above), we would define the fields:

```yaml
fields:
- name: timestamp
  type: timestamp
  required: true
  timeFormats: 
   - '%b %d %H:%M:%S'
  isEventTime: false # the timestamps have no year so we cannot use them as partition time
- name: log_level
  type: string
  required: true
- name: apikey
  type: string
- name: username
  type: string
- name: request_ip
  type: string
  indicators: [ip]
- name: message
  type: string
```

{% endtab %}

{% tab title="Full YAML representation" %}

```yaml
parser:
  regex:
    patternDefinitions:
      JUNIPER_TIMESTAMP: '[A-Z][a-z]{2} \d?\d \d\d:\d\d:\d\d'
      # An apikey is composed of 32 hex characters
      API_KEY: '[a-fA-F0-9]{32}'
    # We will be splitting the pattern in multiple parts so we can add comments helping us debug it in the future.
    # All parts are concatenated into a single pattern by Panther WITHOUT ADDING SPACES BETWEEN PARTS.
    # If you don't want to split your patterns just use an array with a single string.
    match:
    # The log line starts with a timestamp (captured as 'timestamp')
    - '^%{JUNIPER_TIMESTAMP:timestamp}'
    # Followed by this static text
    - ' my-jwas \[mws-audit\]'
    # Then comes the log level surrounded by square brackets and optional space (captured as 'log_level')
    - '\[%{DATA:log_level}\] ?' 
    # After it, we get either an api key or a user name, surrounded by square brackets,
    # which we capture as 'apikey' or 'username' depending on the match
    - '\[(%{API_KEY:apikey}|%{USERNAME:username})\] '
    # Optionally followed by the ip address of the request in square brackets (captured as 'request_ip')
    # Note that we use 'DATA' instead of the specific 'IP' named pattern. 
    # It is not needed because 'request_ip' is always at this position and we are certain of the log type match
    # due to the distinctive ' my-jwas [mws-audit]' literal.
    - '(\[%{DATA:request_ip}\])?'
    # And finally the rest of the line is the message (captured as 'message')
    - '%{GREEDYDATA:message}'
    trimSpace: true # We want to trim the space of the message
fields:
- name: timestamp
  type: timestamp
  required: true
  timeFormats: 
   - '%b %d %H:%M:%S'
  isEventTime: false # the timestamps have no year so we cannot use them as partition time
- name: log_level
  type: string
  required: true
- name: apikey
  type: string
- name: username
  type: string
- name: request_ip
  type: string
  indicators: [ip]
- name: message
  type: string
```

{% endtab %}
{% endtabs %}

## Built-in regex pattern reference

The following tables detail the built-in Panther regex patterns you can use.

### General

<table><thead><tr><th width="173">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>DATA</code></td><td><code>.*?</code></td></tr><tr><td><code>GREEDYDATA</code></td><td><code>.*</code></td></tr><tr><td><code>NOTSPACE</code></td><td><code>\S+</code></td></tr><tr><td><code>SPACE</code></td><td><code>\s*</code></td></tr><tr><td><code>WORD</code></td><td><code>\b\w+\b</code></td></tr><tr><td><code>QUOTEDSTRING</code></td><td><code>"(?:\.|[^\"]+)+"|""|'(?:\.|[^\']+)+'|''</code></td></tr><tr><td><code>HEXDIGIT</code></td><td><code>[0-9a-fAF]</code></td></tr><tr><td><code>UUID</code></td><td><code>%{HEXDIGIT}{8}-(?:%{HEXDIGIT}{4}-){3}%{HEXDIGIT}{12}</code></td></tr></tbody></table>

### Numbers

<table><thead><tr><th width="153">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>INT</code></td><td><code>[+-]?(?:[0-9]+)</code></td></tr><tr><td><code>BASE10NUM</code></td><td><code>[+-]?(?:[0-9]+(?:.[0-9]+)?)|.[0-9]+</code></td></tr><tr><td><code>NUMBER</code></td><td><code>%{BASE10NUM}</code></td></tr><tr><td><code>BASE16NUM</code></td><td><code>(?:0[xX])?%{HEXDIGIT}+</code></td></tr><tr><td><code>POSINT</code></td><td><code>\b[1-9][0-9]*\b</code></td></tr><tr><td><code>NONNEGINT</code></td><td><code>\b[0-9]+\b</code></td></tr></tbody></table>

### Network

<table><thead><tr><th width="154">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>CISCOMAC</code></td><td><code>(?:[A-Fa-f0-9]{4}.){2}[A-Fa-f0-9]{4}</code></td></tr><tr><td><code>WINDOWSMAC</code></td><td><code>(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}</code></td></tr><tr><td><code>COMMONMAC</code></td><td><code>(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}</code></td></tr><tr><td><code>MAC</code></td><td><code>%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}</code></td></tr><tr><td><code>IPV6</code></td><td><code>\b(?:(?:(?:%{HEXDIGIT}{1,4}:){7}(?:%{HEXDIGIT}{1,4}|:))|(?:(?:%{HEXDIGIT}{1,4}:){6}(?::%{HEXDIGIT}{1,4}|(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(?:(?:%{HEXDIGIT}{1,4}:){5}(?:(?:(?::%{HEXDIGIT}{1,4}){1,2})|:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|((%{HEXDIGIT}{1,4}:){4}(((:%{HEXDIGIT}{1,4}){1,3})|((:%{HEXDIGIT}{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){3}(((:%{HEXDIGIT}{1,4}){1,4})|((:%{HEXDIGIT}{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){2}(((:%{HEXDIGIT}{1,4}){1,5})|((:%{HEXDIGIT}{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){1}(((:%{HEXDIGIT}{1,4}){1,6})|((:%{HEXDIGIT}{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:%{HEXDIGIT}{1,4}){1,7})|((:%{HEXDIGIT}{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\b</code></td></tr><tr><td><code>IPV4INT</code></td><td><code>25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]</code></td></tr><tr><td><code>IPV4</code></td><td><code>\b(?:(?:%{IPV4INT}).){3}(?:%{IPV4INT})\b</code></td></tr><tr><td><code>IP</code></td><td><code>%{IPV6}|%{IPV4}</code></td></tr><tr><td><code>HOSTNAME</code></td><td><code>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)</code></td></tr><tr><td><code>IPORHOST</code></td><td><code>%{IP}|%{HOSTNAME}</code></td></tr><tr><td><code>HOSTPORT</code></td><td><code>%{IPORHOST}:%{POSINT}</code></td></tr></tbody></table>

### URI

<table><thead><tr><th width="177">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>USERNAME</code></td><td><code>[a-zA-Z0-9._-]+</code></td></tr><tr><td><code>UNIXPATH</code></td><td><code>(?:/[\w_%!$@:.,-]?/?)(\S+)?</code></td></tr><tr><td><code>WINPATH</code></td><td><code>(?:[A-Za-z]:|\)(?:\[^\?])+</code></td></tr><tr><td><code>PATH</code></td><td><code>(?:%{UNIXPATH}|%{WINPATH})</code></td></tr><tr><td><code>TTY</code></td><td><code>(?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))</code></td></tr><tr><td><code>URIPROTO</code></td><td><code>[A-Za-z]+(?:+[A-Za-z+]+)?</code></td></tr><tr><td><code>URIHOST</code></td><td><code>%{IPORHOST}(?::%{POSINT})?</code></td></tr><tr><td><code>URIPATH</code></td><td><code>(?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_-]*)+</code></td></tr><tr><td><code>URIPARAM</code></td><td><code>?[A-Za-z0-9$.+!*'|(){},~@#%&#x26;/=:;_?-[]&#x3C;>]*</code></td></tr><tr><td><code>URIPATHPARAM</code></td><td><code>%{URIPATH}(?:%{URIPARAM})?</code></td></tr><tr><td><code>URI</code></td><td><code>%{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?</code></td></tr></tbody></table>

### Timestamps

<table><thead><tr><th width="236">Name</th><th>Regex</th></tr></thead><tbody><tr><td><code>MONTH</code></td><td><code>\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b MONTHNUM 0?[1-9]|1[0-2]</code></td></tr><tr><td><code>MONTHNUM</code></td><td><code>0?[1-9]|1[0-2]</code></td></tr><tr><td><code>MONTHNUM2</code></td><td><code>0[1-9]|1[0-2]</code></td></tr><tr><td><code>MONTHDAY</code></td><td><code>(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]</code></td></tr><tr><td><code>DAY</code></td><td><code>\b(?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)\b</code></td></tr><tr><td><code>YEAR</code></td><td><code>(?:\d\d){1,2}</code></td></tr><tr><td><code>HOUR</code></td><td><code>2[0123]|[01]?[0-9]</code></td></tr><tr><td><code>MINUTE</code></td><td><code>[0-5][0-9]</code></td></tr><tr><td><code>SECOND</code></td><td><code>(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?</code></td></tr><tr><td><code>KITCHEN</code></td><td><code>%{HOUR}:%{MINUTE}</code></td></tr><tr><td><code>TIME</code></td><td><code>%{HOUR}:%{MINUTE}:%{SECOND}</code></td></tr><tr><td><code>DATE_US</code></td><td><code>%{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}</code></td></tr><tr><td><code>DATE_EU</code></td><td><code>%{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}</code></td></tr><tr><td><code>ISO8601_TIMEZONE</code></td><td><code>(?:Z|[+-]%{HOUR}(?::?%{MINUTE}))</code></td></tr><tr><td><code>ISO8601_SECOND</code></td><td><code>(?:%{SECOND}|60)</code></td></tr><tr><td><code>TIMESTAMP_ISO8601</code></td><td><code>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?</code></td></tr><tr><td><code>DATE</code></td><td><code>%{DATE_US}|%{DATE_EU}</code></td></tr><tr><td><code>DATETIME</code></td><td><code>%{DATE}[- ]%{TIME}</code></td></tr><tr><td><code>TZ</code></td><td><code>[A-Z]{3}</code></td></tr><tr><td><code>TZOFFSET</code></td><td><code>[+-]\d{4}</code></td></tr><tr><td><code>TIMESTAMP_RFC822</code></td><td><code>%{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}</code></td></tr><tr><td><code>TIMESTAMP_RFC2822</code></td><td><code>%{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}</code></td></tr><tr><td><code>TIMESTAMP_OTHER</code></td><td><code>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}</code></td></tr><tr><td><code>TIMESTAMP_EVENTLOG</code></td><td><code>%{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}</code></td></tr><tr><td><code>SYSLOGTIMESTAMP</code></td><td><code>%{MONTH} +%{MONTHDAY} %{TIME}</code></td></tr><tr><td><code>HTTPDATE</code></td><td><code>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{TZOFFSET}</code></td></tr></tbody></table>

### Aliases

<table><thead><tr><th width="134">Name</th><th>Equivalent To</th></tr></thead><tbody><tr><td><code>NS</code></td><td><code>NOTSPACE</code></td></tr><tr><td><code>QS</code></td><td><code>QUOTEDSTRING</code></td></tr><tr><td><code>HOST</code></td><td><code>HOSTNAME</code></td></tr><tr><td><code>PID</code></td><td><code>POSINT</code></td></tr><tr><td><code>USER</code></td><td><code>USERNAME</code></td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.panther.com/data-onboarding/custom-log-types/regex-parser.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
