Regex Log Parser
Overview
For text log types with more complex structure, you can use the regex
parser.
The regex
parser uses named groups in regular expressions to extract field values from each line of text. You can use grok syntax (i.e. %{PATTERN_NAME:field_name}
) to build complex expressions taking advantage of the built-in patterns provided by Panther or by defining your own.
Panther's log processor uses the RE2
syntax for regular expressions. RE2
does not support some operations common to other regular expression engines, such as lookbehind
. Be sure to check any expressions or grok patterns you copy/paste from other systems.
For example to match the text
We can use this grok syntax with this pattern:
Which is the rough equivalent of this 'raw' regular expression:
For best performance stick to simple built-in patterns such as DATA
, NOTSPACE
, GREEDYDATA
and WORD
. Avoid complex expressions unless it is required to distinguish the field name based on the value (e.g. (%{IP:ip_address}|%{WORD:username})
Example using regex
Using the regex
parser we will define a log type for Juniper.Audit
logs. Panther already supports these logs natively, but we will be using them here because they have variable conflicting forms and can only be 'solved' by using regex
parser.
The sample logs for Juniper.Audit
are:
Here is how we would define a log schema for these logs using regex
:
In the Panther Console, we would follow the How to create a custom schema manually instructions, selecting the Regex parser.
In the Fields & Indicators section (below the Parser section shown in the screenshot above), we would define the fields:
Built-in regex pattern reference
The following tables detail the built-in Panther regex patterns you can use.
General
DATA
.*?
GREEDYDATA
.*
NOTSPACE
\S+
SPACE
\s*
WORD
\b\w+\b
QUOTEDSTRING
"(?:\.|[^\"]+)+"|""|'(?:\.|[^\']+)+'|''
HEXDIGIT
[0-9a-fAF]
UUID
%{HEXDIGIT}{8}-(?:%{HEXDIGIT}{4}-){3}%{HEXDIGIT}{12}
Numbers
INT
[+-]?(?:[0-9]+)
BASE10NUM
[+-]?(?:[0-9]+(?:.[0-9]+)?)|.[0-9]+
NUMBER
%{BASE10NUM}
BASE16NUM
(?:0[xX])?%{HEXDIGIT}+
POSINT
\b[1-9][0-9]*\b
NONNEGINT
\b[0-9]+\b
Network
CISCOMAC
(?:[A-Fa-f0-9]{4}.){2}[A-Fa-f0-9]{4}
WINDOWSMAC
(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}
COMMONMAC
(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}
MAC
%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}
IPV6
\b(?:(?:(?:%{HEXDIGIT}{1,4}:){7}(?:%{HEXDIGIT}{1,4}|:))|(?:(?:%{HEXDIGIT}{1,4}:){6}(?::%{HEXDIGIT}{1,4}|(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(?:(?:%{HEXDIGIT}{1,4}:){5}(?:(?:(?::%{HEXDIGIT}{1,4}){1,2})|:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|((%{HEXDIGIT}{1,4}:){4}(((:%{HEXDIGIT}{1,4}){1,3})|((:%{HEXDIGIT}{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){3}(((:%{HEXDIGIT}{1,4}){1,4})|((:%{HEXDIGIT}{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){2}(((:%{HEXDIGIT}{1,4}){1,5})|((:%{HEXDIGIT}{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((%{HEXDIGIT}{1,4}:){1}(((:%{HEXDIGIT}{1,4}){1,6})|((:%{HEXDIGIT}{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:%{HEXDIGIT}{1,4}){1,7})|((:%{HEXDIGIT}{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\b
IPV4INT
25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]
IPV4
\b(?:(?:%{IPV4INT}).){3}(?:%{IPV4INT})\b
IP
%{IPV6}|%{IPV4}
HOSTNAME
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)
IPORHOST
%{IP}|%{HOSTNAME}
HOSTPORT
%{IPORHOST}:%{POSINT}
URI
USERNAME
[a-zA-Z0-9._-]+
UNIXPATH
(?:/[\w_%!$@:.,-]?/?)(\S+)?
WINPATH
(?:[A-Za-z]:|\)(?:\[^\?])+
PATH
(?:%{UNIXPATH}|%{WINPATH})
TTY
(?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
URIPROTO
[A-Za-z]+(?:+[A-Za-z+]+)?
URIHOST
%{IPORHOST}(?::%{POSINT})?
URIPATH
(?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_-]*)+
URIPARAM
?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?-[]<>]*
URIPATHPARAM
%{URIPATH}(?:%{URIPARAM})?
URI
%{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
Timestamps
MONTH
\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b MONTHNUM 0?[1-9]|1[0-2]
MONTHNUM
0?[1-9]|1[0-2]
MONTHNUM2
0[1-9]|1[0-2]
MONTHDAY
(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]
DAY
\b(?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)\b
YEAR
(?:\d\d){1,2}
HOUR
2[0123]|[01]?[0-9]
MINUTE
[0-5][0-9]
SECOND
(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?
KITCHEN
%{HOUR}:%{MINUTE}
TIME
%{HOUR}:%{MINUTE}:%{SECOND}
DATE_US
%{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU
%{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE
(?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND
(?:%{SECOND}|60)
TIMESTAMP_ISO8601
%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE
%{DATE_US}|%{DATE_EU}
DATETIME
%{DATE}[- ]%{TIME}
TZ
[A-Z]{3}
TZOFFSET
[+-]\d{4}
TIMESTAMP_RFC822
%{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
TIMESTAMP_RFC2822
%{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
TIMESTAMP_OTHER
%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
TIMESTAMP_EVENTLOG
%{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}
SYSLOGTIMESTAMP
%{MONTH} +%{MONTHDAY} %{TIME}
HTTPDATE
%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{TZOFFSET}
Aliases
NS
NOTSPACE
QS
QUOTEDSTRING
HOST
HOSTNAME
PID
POSINT
USER
USERNAME
Last updated