Regex Log Parser

Overview

For text log types with more complex structure, you can use the regex parser.

The regex parser uses named groups in regular expressions to extract field values from each line of text. You can use grok syntax (i.e. %{PATTERN_NAME:field_name}) to build complex expressions taking advantage of the built-in patterns provided by Panther or by defining your own.

Panther's log processor uses the RE2 syntax for regular expressions. RE2 does not support some operations common to other regular expression engines, such as lookbehind. Be sure to check any expressions or grok patterns you copy/paste from other systems.

For example to match the text

2020-10-10T14:32:05 [[email protected]] [DEBUG] "" Something when wrong

We can use this grok syntax with this pattern:

%{NOTSPACE:timestamp} \[%{WORD:service}@%{DATA:ip}\] \[%{WORD:log_level}\] %{GREEDYDATA:message}

Which is the rough equivalent of this 'raw' regular expression:

(?P<timestamp>\S+) \[(?P<service>\w+)@(?P<ip>.*?)\] \[(?P<log_level>\w+)\] (?P<message>.*)

For best performance stick to simple built-in patterns such as DATA, NOTSPACE, GREEDYDATA and WORD. Avoid complex expressions unless it is required to distinguish the field name based on the value (e.g. (%{IP:ip_address}|%{WORD:username})

Example using regex

Using the regex parser we will define a log type for Juniper.Audit logs. Panther already supports these logs natively, but we will be using them here because they have variable conflicting forms and can only be 'solved' by using regex parser.

The sample logs for Juniper.Audit are:

Jan 22 16:14:23 my-jwas [mws-audit][INFO] [mykonos] [10.10.0.117] Logged in successfully
Jan 23 19:16:22 my-jwas [mws-audit][INFO] [ea77722a8516b0d1135abb19b1982852] Deactivate response 1832840420318015488
Feb 7 20:29:51 my-jwas [mws-audit][INFO] [mykonos] [10.10.0.113] Login failed. Attempt: 1
Feb 14 19:02:54 my-jwas [mws-audit][INFO][mykonos] Changed configuration parameters: services.spotlight.enabled, services.spotlight.server_address

Here is how we would define a log schema for these logs using regex:

In the Panther Console, we would follow the How to create a custom schema manually instructions, selecting the Regex parser.

In the Fields & Indicators section (below the Parser section shown in the screenshot above), we would define the fields:

fields:
- name: timestamp
  type: timestamp
  required: true
  timeFormats: 
   - '%b %d %H:%M:%S'
  isEventTime: false # the timestamps have no year so we cannot use them as partition time
- name: log_level
  type: string
  required: true
- name: apikey
  type: string
- name: username
  type: string
- name: request_ip
  type: string
  indicators: [ip]
- name: message
  type: string

Built-in regex pattern reference

The following tables detail the built-in Panther regex patterns you can use.

General

NameRegex

DATA

.*?

GREEDYDATA

.*

NOTSPACE

\S+

SPACE

\s*

WORD

\b\w+\b

QUOTEDSTRING

"(?:\.|[^\"]+)+"|""|'(?:\.|[^\']+)+'|''

HEXDIGIT

[0-9a-fAF]

UUID

%{HEXDIGIT}{8}-(?:%{HEXDIGIT}{4}-){3}%{HEXDIGIT}{12}

Numbers

NameRegex

INT

[+-]?(?:[0-9]+)

BASE10NUM

[+-]?(?:[0-9]+(?:.[0-9]+)?)|.[0-9]+

NUMBER

%{BASE10NUM}

BASE16NUM

(?:0[xX])?%{HEXDIGIT}+

POSINT

\b[1-9][0-9]*\b

NONNEGINT

\b[0-9]+\b

Network

NameRegex

CISCOMAC

(?:[A-Fa-f0-9]{4}.){2}[A-Fa-f0-9]{4}

WINDOWSMAC

(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}

COMMONMAC

(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}

MAC

%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}

IPV6

\b(?:(?:(?:%{HEXDIGIT}{1,4}:){7}(?:%{HEXDIGIT}{1,4}|:))|(?:(?:%{HEXDIGIT}{1,4}:){6}(?::%{HEXDIGIT}{1,4}|(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(?:(?:%{HEXDIGIT}{1,4}:){5}(?:(?:(?::%{HEXDIGIT}{1,4}){1,2})|:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|((%{HEXDIGIT}{1,4}:){4}(((:%{HEXDIGIT}{1,4}){1,3})|((:%{HEXDIGIT}{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){3}(((:%{HEXDIGIT}{1,4}){1,4})|((:%{HEXDIGIT}{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){2}(((:%{HEXDIGIT}{1,4}){1,5})|((:%{HEXDIGIT}{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){1}(((:%{HEXDIGIT}{1,4}){1,6})|((:%{HEXDIGIT}{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:%{HEXDIGIT}{1,4}){1,7})|((:%{HEXDIGIT}{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\b

IPV4INT

25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]

IPV4

\b(?:(?:%{IPV4INT}).){3}(?:%{IPV4INT})\b

IP

%{IPV6}|%{IPV4}

HOSTNAME

\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)

IPORHOST

%{IP}|%{HOSTNAME}

HOSTPORT

%{IPORHOST}:%{POSINT}

URI

NameRegex

USERNAME

[a-zA-Z0-9._-]+

UNIXPATH

(?:/[\w_%!$@:.,-]?/?)(\S+)?

WINPATH

(?:[A-Za-z]:|\)(?:\[^\?])+

PATH

(?:%{UNIXPATH}|%{WINPATH})

TTY

(?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))

URIPROTO

[A-Za-z]+(?:+[A-Za-z+]+)?

URIHOST

%{IPORHOST}(?::%{POSINT})?

URIPATH

(?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_-]*)+

URIPARAM

?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?-[]<>]*

URIPATHPARAM

%{URIPATH}(?:%{URIPARAM})?

URI

%{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

Timestamps

NameRegex

MONTH

\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b MONTHNUM 0?[1-9]|1[0-2]

MONTHNUM

0?[1-9]|1[0-2]

MONTHNUM2

0[1-9]|1[0-2]

MONTHDAY

(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]

DAY

\b(?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)\b

YEAR

(?:\d\d){1,2}

HOUR

2[0123]|[01]?[0-9]

MINUTE

[0-5][0-9]

SECOND

(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?

KITCHEN

%{HOUR}:%{MINUTE}

TIME

%{HOUR}:%{MINUTE}:%{SECOND}

DATE_US

%{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}

DATE_EU

%{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}

ISO8601_TIMEZONE

(?:Z|[+-]%{HOUR}(?::?%{MINUTE}))

ISO8601_SECOND

(?:%{SECOND}|60)

TIMESTAMP_ISO8601

%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?

DATE

%{DATE_US}|%{DATE_EU}

DATETIME

%{DATE}[- ]%{TIME}

TZ

[A-Z]{3}

TZOFFSET

[+-]\d{4}

TIMESTAMP_RFC822

%{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}

TIMESTAMP_RFC2822

%{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}

TIMESTAMP_OTHER

%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}

TIMESTAMP_EVENTLOG

%{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}

SYSLOGTIMESTAMP

%{MONTH} +%{MONTHDAY} %{TIME}

HTTPDATE

%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{TZOFFSET}

Aliases

NameEquivalent To

NS

NOTSPACE

QS

QUOTEDSTRING

HOST

HOSTNAME

PID

POSINT

USER

USERNAME

Last updated