Log Schema Reference

In this guide, you will find common fields used to build YAML-based schemas when onboarding Custom Log Types and Lookup Table schemas.

Required fields throughout this page are in bold.

LogSchema fields

Each log schema contains the following fields:

  • fields ([]FieldSchema)

    • The fields in each Log Event.

  • parser (ParserSpec)

    • A parser that can convert non-JSON logs to JSON and/or perform custom transformations

CI/CD schema fields

Additionally, schemas defined using a CI/CD workflow can contain the following fields:

  • schema (string)

    • The name of the schema

  • description (string)

    • A short description that will appear in the UI

  • referenceURL (string)

    • A link to an external document which specifies the log structure. Often, this is a link to a 3rd party's documentation.

  • fieldDiscoveryEnabled (boolean)

See the Custom Logs page for information on how to manage schemas through a CI/CD pipeline using Panther Analysis Tool (PAT).

Example

The example below contains the CI/CD fields mentioned above.

schema: Custom.MySchema
description: (Optional) A handy description so I know what the schema is for.
referenceURL: (Optional) A link to some documentation on the logs this schema is for.
fieldDiscoveryEnabled: true
parser:
  csv:
    delimiter: ','
    hasHeader: true
fields:
- name: action
  type: string
  required: true
- name: time
  type: timestamp
  timeFormats:
    - unix

ParserSpec

A ParserSpec specifies a parser to use to convert non-JSON input to JSON. Only one of the following fields can be specified:

  • fastmatch (FastmatchParser{}): Use fastmatch parser

  • regex (RegexParser{}): Use regex parser

  • csv (CSVParser{}): Use csv parser

    • Note: The columns field is required when there are multiple CSV schemas in the same log source.

    • Learn more on CSV Log Parser.

  • script: Use script parser

See the fields for fastmatch, regex, and csv in the tabs below.

Parser fastmatch fields

  • match ([]string): One or more patterns to match log lines against. This field cannot be empty.

  • emptyValues ([]string): Values to consider as null.

  • expandFields (map[string]string): Additional fields to be injected by expanding text templates.

  • trimSpace (bool): Trim space surrounding each value.

FieldSchema

A FieldSchema defines a field and its value. The field is defined by:

  • name (string)

    • The name of the field.

  • required (boolean)

    • If the field is required or not.

  • description (string)

    • Some text documenting the field.

  • copy (object)

    • If present, the field's value will be copied from the referenced object.

  • rename (object)

    • If present, the field's name will be changed.

  • concat (object)

    • If present, the field's value will be the combination of the values of two or more other fields.

  • split (object)

    • If present, the field's value will be extracted from another string field by splitting it based on a separator.

  • mask (object)

    • If present, the field's value will be masked.

Its value is defined using the fields of a ValueSchema.

ValueSchema

A ValueSchema defines a value and how it should be processed. Each ValueSchema has a type field that can be of the following values:

Type Values

Description

string

A string value

int

A 32-bit integer number in the range -2147483648, 2147483647

smallint

A 16-bit integer number in the range -32768, 32767

bigint

A 64-bit integer number in the range -9223372036854775808, 9223372036854775807

float

A 64-bit floating point number

boolean

A boolean value true / false

timestamp

A timestamp value

array

A JSON array where each element is of the same type

object

A JSON object of known keys

json

Any valid JSON value (JSON object, array, number, string, boolean)

The fields of a ValueSchema depend on the value of the type.

Type

Field

Value

Description

object

fields (required)

An array of FieldSpec objects describing the fields of the object.

array

element (required)

A ValueSchema describing the elements of an array.

timestamp

timeFormats (required)

[]String

An array specifying the formats to use for parsing the timestamp (see Timestamps)

timestamp

isEventTime

Boolean

A flag to tell Panther to use this timestamp as the Log Event Timestamp.

string

indicators

[]String

Tells Panther to extract indicators from this value (see Indicators)

string

validate

Validation rules for the string value

Timestamps

The timeFormats field was introduced in version 1.46 to support multiple timestamp formats in custom log schemas. While timeFormat is still supported for existing log sources, we recommend using timeFormats for all new schemas.

Timestamps are defined by setting the type field to timestamp and specifying the timestamp format using the timeFormats field. Timestamp formats can be any of the built-in timestamp formats:

timeFormatExampleDescription

rfc3339

2022-04-04T17:09:17Z

The most common timestamp format.

unix_auto

1649097448 (seconds) 1649097491531 (milliseconds) 1649097442000000 (microseconds) 1649097442000000000 (nanoseconds)

Timestamp expressed in time passed since UNIX epoch time. It can handle seconds, milliseconds, microseconds, and nanoseconds.

unix

1649097448

Timestamp expressed in seconds since UNIX epoch time. It can handle fractions of seconds as a decimal part.

unix_ms

1649097491531

Timestamp expressed in milliseconds since UNIX epoch time.

unix_us

1649097442000000

Timestamp expressed in microseconds since UNIX epoch time.

unix_ns

1649097442000000000

Timestamp expressed in nanoseconds since UNIX epoch time. Scientific float notation is supported.

Defining a custom format

You can also define a custom format by using strftime notation. For example:

# The field is a timestmap using a custom timestamp format like "2020-09-14 14:29:21"
- name: ts
  type: timestamp
  timeFormats:
    - "%Y-%m-%d %H:%M:%S" # note the quotes required for proper YAML syntax

Panther's strftime format supports using %N code to parse nanoseconds. For example:

%H:%M:%S.%N can be used to parse 11:12:13.123456789

Using multiple time formats

When multiple time formats are defined, each of them will be tried sequentially until successful parsing is achieved:

- name: ts
  type: timestamp
  timeFormats:
    - rfc3339
    - unix

Timestamp values can be marked with isEventTime: true to tell Panther that it should use this timestamp as the p_event_time field. It is possible to set isEventTime on multiple fields. This covers the cases where some logs have optional or mutually exclusive fields holding event time information. Since there can only be a single p_event_time for every Log Event, the priority is defined using the order of fields in the schema.

Schema test cases that are used with the pantherlog test command must define the time field value in theresult payload formatted as YYYY-MM-DD HH:MM:SS.fffffffff. For backwards compatibility reasons, single time format configurations will retain the same format.

Example:

- name: singleFormatTimestamp
  type: timestamp
  timeFormats:
    - unix
input: >
  {
    "singleFormatTimestamp": "1666613239"
  }
result: >
  {
    "singleFormatTimestamp": "1666613239"
  }

When multiple time formats are defined:

- name: multipleFormatTimestamp
  type: timestamp
  timeFormats:
    - unix
    - rfc3339
input: >
  {
    "multipleFormatTimestamp": "1666613239"
  }
result: >
  {
    "multipleFormatTimestamp": "2022-10-24 12:07:19.459326000"
  }

Indicators

Values of string type can be used as indicators. To mark a field as an indicator, set the indicators field to an array of indicator scanner names (more than one may be used). This will instruct Panther to parse the string and store any indicator values it finds to the relevant p_any_ field. For a list of values that are valid to use in the indicators field, see Standard Fields.

For example:

# Will scan the value as IP address and store it to `p_any_ip_addresses`
- name: remote_ip
  type: string
  indicators: [ ip ]

# Will scan the value as a domain name and/or IP address.
# Will store the result in `p_any_domain_names` and/or `p_any_ip_addresses`
- name: target_url
  type: string
  indicators: [ url ]

Validation - Allow/Deny lists

Values of string type can be further restricted by declaring a list of values to allow or deny. This allows to have different log types that have common overlapping fields but differ on values of those fields.

# Will only allow 'login' and 'logout' event types to match this log type
- name: event_type
  type: string
  validate:
    allow: [ "login", "logout"]
    
# Will match any event type other than 'login' and 'logout'
- name: event_type
  type: string
  validate:
    deny: [ "login", "logout"]

Validation by string type

Values of string type can be restricted to match well-known formats. Currently, Panther supports the ip and cidr formats to require that a string value be a valid IP address or CIDR range. Note that the ip and cidr validation types can be combined with allow or deny rules but it is somewhat redundant, for example, if you allow two IP addresses, then adding an ip validation will simply ensure that your validation will not include false positives if the IP addresses in your list are not valid.

# Will allow valid ipv4 IP addresses e.g. 100.100.100.100
- name: address
  type: string
  validate:
    ip: "ipv4"
    
# Will allow valid ipv6 CIDR ranges 
# e.g. 2001:0db8:85a3:0000:0000:0000:0000:0000/64
- name: address
  type: string
  validate:
    cidr: "ipv6"
    
# Will allow any valid ipv4 or ipv6 address
- name: address
  type: string
  validate:
    ip: "any"    

Using JSON schema in an IDE

If your code editor or integrated development environment (IDE) supports JSON Schema, you can configure it to use this schema file for Panther schemas and this schema-tests file for schema tests. Doing so will allow you to receive suggestions and error messages while developing Panther schemas and their tests.

JetBrains custom JSON schemas

See the JetBrains documentation for instructions on how to configure JetBrains IDEs to use custom JSON Schemas.

VSCode custom JSON schemas

See the VSCode documentation for instructions on how to configure VSCode to use JSON Schemas.

Stream type

While performing certain actions in the Panther Console, such as configuring an S3 bucket for Data Transport or inferring a custom schema from raw logs, you need to select a log stream type.

View example log events for each type below.

Stream typeDescriptionExample log event(s)

Auto

Panther will automatically detect the appropriate stream type.

n/a

Lines

Events are separated by a new line character.

"10.0.0.1","[email protected]","France"
"10.0.0.2","[email protected]","France"
"10.0.0.3","[email protected]","France"

JSON

Events are in JSON format.

{ 
"ip": "10.0.0.1", 
"un": "[email protected]", 
"country": "France" 
}
or
{ "ip": "10.0.0.1", "un": "[email protected]", "country": "France" }{ "ip": "10.0.0.2", "un": "[email protected]", "country": "France" }{ "ip": "10.0.0.3", "un": "[email protected]", "country": "France" }
or
{ "ip": "10.0.0.1", "un": "[email protected]", "country": "France" }
{ "ip": "10.0.0.2", "un": "[email protected]", "country": "France" }
{ "ip": "10.0.0.3", "un": "[email protected]", "country": "France" }

JSON Array

Events are inside an array of JSON objects.

[
	{ "ip": "10.0.0.1", "username": "[email protected]", "country": "France" },
	{ "ip": "10.0.0.2", "username": "[email protected]", "country": "France" },
	{ "ip": "10.0.0.3", "username": "[email protected]", "country": "France" }
]

CloudWatch Logs

Events came from CloudWatch Logs.

{
  "owner": "111111111111",
  "logGroup": "services/foo/logs",
  "logStream": "111111111111_CloudTrail/logs_us-east-1",
  "messageType": "DATA_MESSAGE",
  "logEvents": [
      {
          "id": "31953106606966983378809025079804211143289615424298221568",
          "timestamp": 1432826855000,
          "message": "{\"ip\": \"10.0.0.1\", \"user\": \"[email protected]\", \"country\": \"France\"}"
      },
      {
          "id": "31953106606966983378809025079804211143289615424298221569",
          "timestamp": 1432826855000,
          "message": "{\"ip\": \"10.0.0.2\", \"user\": \"[email protected]\", \"country\": \"France\"}"
      },
      {
          "id": "31953106606966983378809025079804211143289615424298221570",
          "timestamp": 1432826855000,
          "message": "{\"ip\": \"10.0.0.3\", \"user\": \"[email protected]\", \"country\": \"France\"}"
      }
  ]
}

Last updated