Links

Custom Logs

Define, write, and manage custom schemas

Overview

Panther allows users to define their own log types by creating a Custom Schema. This page explains how to define, write, and manage custom schemas, as well as how to use the pantherlog CLI tool and how to upload schemas with Panther Analysis Tool (PAT).
Custom Schemas are identified by a Custom. prefix in their name and can be used wherever a native Log Type is used:
  • You can use a Custom Schema when onboarding data through S3, SQS or GCS.
  • You can write Rules for Custom Schemas.
  • You can query the data in Data Explorer. Panther will create a new table for the Custom Schema once you onboard a source that uses it.
  • You can query the data through Indicator Search.
Panther supports JSON data formats and CSV with or without headers for custom log types. For inferring schemas, Panther does not support CSV without headers.

How to define a schema

You can define a schema via the following:
  • S3 data
  • Sample logs
  • Manual creation
Sample logs
S3 data
Manual creation

Generating a schema from sample logs

You can generate a schema by uploading sample logs into the Panther Console. To get started, follow these steps:
  1. 1.
    Log in to your Panther account.
  2. 2.
    On the left sidebar, navigate to Configure > Schemas.
  3. 3.
    At the top right of the page next to the search bar, click Create New.
  4. 4.
    On the New Data Schema page, enter a Schema ID, Description, and Reference URL.
    • The Description is meant for content about the table, while the Reference URL can be used to link to internal resources.
  5. 5.
    Scroll to the bottom of the page where you'll find the option to upload sample log files.
  6. 6.
    Upload a sample set of logs: Drag a file from your computer over the "Infer schema from sample logs" box or click Select file and choose the log file. Note that Panther does not support CSV without headers for inferring schemas.
    • After uploading a file, Panther will display the raw logs in the UI. You can expand the log lines to view the entire raw log. Note that if you add another sample set, it will override the previously-uploaded sample.
  7. 7.
    Click Infer Schema from All Logs.
    • Panther will begin to infer a schema from the raw sample logs. Depending on the number of logs uploaded, this could take some time (e.g. 30 seconds for 100 logs).
    • Once the schema is generated, it will appear in the schema editor box above the raw logs.
  8. 8.
    To ensure the schema works properly against the sample logs you uploaded and against any changes you make to the schema, click Validate & Test Schema.
    • This test will validate that the syntax of your schema is correct and that the log samples you have uploaded into Panther are successfully matching against the schema. You should see the results appear below the schema editor box.
    • All successfully matched logs will appear under Matched; each log will display the column, field, and JSON view.
    • All unsuccessfully matched logs will appear under Unmatched; each log will display the error message and the raw log.
  9. 9.
    Click Save to publish the schema.
Note that the UI will only display up to 100 logs. This doesn't mean Panther can only infer from 100 logs, Panther will infer from all logs uploaded, this is just a performance characteristic to ensure fast response time when generating a schema.

Generating and testing a data schema for a Custom Log from data in S3

You can generate a schema for a custom log from live data streaming from an S3 bucket and then test that schema before publishing.

View raw data from S3

After onboarding your S3 bucket onto Panther, you can view raw data coming into Panther before processing. To view the raw data and kick off schema inference/testing, proceed with the following:
  1. 1.
    Follow the instructions to onboard an S3 bucket onto Panther without having a schema in place.
    • Skip the step where you list schemas and prefixes in the first page of the S3 onboarding wizard.
    • You'll have the opportunity to add schemas and prefixes after the S3 bucket is onboarded.
  2. 2.
    While viewing your log source, click Schemas on the left:
  3. 3.
    Follow the options on the screen to configure schemas. Choose from the following options:
    • I already know schemas and prefixes: Choose this option if you already created a schema and you know the S3 prefix you want Panther to read logs from. Click Start in the tile.
    • I want to generate a schema from raw events: Select this option to generate a schema from live data in this bucket and define which prefixes you want Panther to read logs from. Click Start in the tile.
    • Create new schema: You can also create your own schema from the Schemas page and return here to attach it. Click Create New Schema below the tile options.
      Note: You may need to wait up to 15 minutes for data to start streaming into Panther.
On the schema inference and testing workflow page, you can view the raw data that Panther has received at the bottom of the screen:
  • This data is displayed from data-archiver, a Panther-managed S3 bucket that retains raw logs for up to 30 days for every S3 log source.
  • If you still do not see data after 15 minutes, ensure that the time picker is set to the appropriate time range that corresponds with the timestamps on the events coming into Panther.

Infer a schema from raw data

Using raw live data coming from S3, you can infer schemas with just a few steps.
  1. 1.
    Once you see data populating in Raw Events, you can filter the raw events you'd like to infer a schema for by using the time, prefix, or string filter. Filter the raw events by going to the top of the raw events table and setting the parameters there.
  2. 2.
    Click Infer Schema to generate a schema.
  3. 3.
    On the Infer New Schema modal that pops up, enter the following:
    • Schema Name: The name of the schema that will map to the table in the data lake once the schema is published.
      • The name will always start with Custom. and must have a capital letter after.
    • Prefix: Use an existing prefix that was set up prior to inferring the schema or a new prefix.
      • The prefix you choose will filter data from the corresponding prefix in the S3 bucket to the schema you've inferred.
      • If you don't need to specify a specific prefix, you can leave this field empty to use the catch-all prefix that is called *.
  4. 4.
    Click Infer Schema.
    • The schema will then be placed in a Draft mode until you're ready to publish to production after testing.
  5. 5.
    Review the schema and its fields by going to the Schemas section and clicking on the schema name.
    • You'll see the name you attributed to the schema with a draft label after the schema is inferred.
    • Since the schema is in Draft, you can change, remove, or add fields as needed.

Test schema with raw data

Once your schemas and prefixes are defined, you can proceed to testing the schema configuration against raw data.
  1. 1.
    In the Test Schemas section at the top of the screen, click on the Run Test button.
  2. 2.
    On the Test Schemas modal that pops up, select the Time Period you would like to test your schemas against and click on the Start Test button.
    • Depending on the time range and amount of data, the test may take a few minutes to complete.
    • Once the test is started, the results appear with the amount of matched and unmatched events.
      • Matched events represent the number of events that would successfully classify against the schema configuration.
      • Unmatched events represent the number of events that would unsuccessfully classify.
  3. 3.
    Inspect the errors and the JSON to decipher what caused the failures.
  4. 4.
    Navigate back to the draft schema, make changes as needed, and test the schemas again.

Adding a Custom Schema manually

To add a Custom Schema manually:
  1. 1.
    In the Panther Console, navigate to Configure > Schemas.
  2. 2.
    Click New in the upper right corner.
  3. 3.
    Enter a name for the Custom Log (ie Custom.SampleAPI) and write or paste your YAML Log Schema definition.
  4. 4.
    Click Validate Syntax at the bottom to verify your schema contains no errors.
    • Note that syntax validation only checks the syntax of the Log Schema. It can still fail to save due to name conflicts.
  5. 5.
    Click Save.
You can now navigate to Configure > Log Sources and add a new source or modify an existing one to use the new Custom.SampleAPI _Log Type. Once Panther receives events from this Source, it will process the logs and store the Log Events to the custom_sampleapi table.
You can also now write Rules to match against these logs and query them using the Data Explorer.

Writing schemas

Writing a schema for JSON logs

You can make use of our pantherlog CLI tool to help you generate your Log Schema
To parse log files where each line is JSON you have to define a Log Schema that describes the structure of each log entry.
In the example schemas below, the first tab displays the JSON log structure and the second tab shows the Log Schema.
Note: Please leverage the Minified JSON Log Example when using the pantherlog tool or generating a schema within the Panther Console.
JSON Log Example
Log Schema Example
{
"method": "GET",
"path": "/-/metrics",
"format": "html",
"controller": "MetricsController",
"action": "index",
"status": 200,
"params": [],
"remote_ip": "1.1.1.1",
"user_id": null,
"username": null,
"ua": null,
"queue_duration_s": null,
"correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
"cpu_s": 0.05,
"db_duration_s": 0,
"view_duration_s": 0.00039,
"duration_s": 0.0459,
"tag": "test",
"time": "2019-11-14T13:12:46.156Z"
}
Minified JSON log example:
{"method":"GET","path":"/-/metrics","format":"html","controller":"MetricsController","action":"index","status":200,"params":[],"remote_ip":"1.1.1.1","user_id":null,"username":null,"ua":null,"queue_duration_s":null,"correlation_id":"c01ce2c1-d9e3-4e69-bfa3-b27e50af0268","cpu_s":0.05,"db_duration_s":0,"view_duration_s":0.00039,"duration_s":0.0459,"tag":"test","time":"2019-11-14T13:12:46.156Z"}
version: 0
fields:
- name: time
description: Event timestamp
required: true
type: timestamp
timeFormat: rfc3339
isEventTime: true
- name: method
description: The HTTP method used for the request
type: string
- name: path
description: The path used for the request
type: string
- name: remote_ip
description: The remote IP address the request was made from
type: string
indicators: [ ip ] # the value will be appended to `p_any_ip_addresses` if it's a valid ip address
- name: duration_s
description: The number of seconds the request took to complete
type: float
- name: format
description: Response format
type: string
- name: user_id
description: The id of the user that made the request
type: string
- name: params
type: array
element:
type: object
fields:
- name: key
description: The name of a Query parameter
type: string
- name: value
description: The value of a Query parameter
type: string
- name: tag
description: Tag for the request
type: string
- name: ua
description: UserAgent header
type: string
You can edit the YAML specifications directly in the Panther Console or they can be prepared offline in your editor/IDE of choice. For more information on the structure and fields in a Log Schema, see the Log Schema Reference.

Writing a schema for text logs

Panther handles logs that are not structured as JSON by using a 'parser' that translates each log line into key/value pairs and feeds it as JSON to the rest of the pipeline. You can define a text parser using the parser field of the Log Schema. Panther provides the following parsers for non-JSON formatted logs:
Name
Description
fastmatch
Match each line of text against one or more simple patterns
regex
Use regular expression patterns to handle more complex matching such as conditional fields, case-insensitive matching etc
csv
Treat log files as CSV mapping colunm names to field names

Managing custom schemas

Editing a Custom Schema

Panther allows limited editing of a Custom Schema. Specifically:
  • You can modify the parser configuration to fix bugs or add new patterns.
  • You can add new fields to the schema.
  • You can edit, add, or remove all properties of existing fields except the type.
  • You cannot rename existing fields.
  • You cannot delete existing fields (doing so would allow renaming in two steps).
  • You cannot change the type of an existing field (this includes the element type for array fields).
To edit a Custom Schema:
  1. 1.
    Navigate to your Custom Schema's details page.
  2. 2.
    Click Edit in the details page.
  3. 3.
    Modify the YAML.
  4. 4.
    Click Update to submit your change.
Validate Syntax can check the YAML for structural compliance, but the rules described above can only be checked on Update. The update will be rejected if the rules are not followed.

Disabling a Custom Schema

A Custom Log can be disabled if no source is using it.
A disabled Custom Log is removed from the listing and its tables are hidden from the Data Explorer view.
Disabling a Custom Log does not affect any data already stored in the data lake. All data is still queryable through Data Explorer or Indicator Search. Trying to add a log with the same name at a later time, will result in failure due to the name conflict.
To disable a Custom Log:
  1. 1.
    Navigate to the details page of the Custom Log type.
  2. 2.
    Click the Enable toggle to set it to Disabled.

Testing a Panther-managed Schema

The "Test Schema against sample logs" feature found on the Schema Edit page in the Panther Console supports:
  • Line-delimited JSON (JSONL)
  • CSV (with or without headers)
Additionally, the above log formats can be compressed using the following formats:
  • gzip
  • zstd (without dictionary)
Multi-file logs are not supported, either raw or in one of the above archive formats.
Need to validate that a Panther-managed schema will work against your logs? You can test sample logs against the Panther-managed schema similarly to testing logs against a custom schema/log type (as described above). Follow the steps below:
  1. 1.
    In the Panther Console, go to Configure > Schemas.
  2. 2.
    Click on a schema labeled as Panther-managed.
  3. 3.
    Once in the schema details page, scroll to the bottom of the page where you'll be able to upload logs.

Using Pantherlog CLI

Panther provides a CLI tool, pantherlog, to help you work with Custom Logs. You can download the executable from the panther-community S3 bucket.
For more information on using Panther's CLI tools, see Panther's Operations documentation.

Generating a schema from JSON samples

You can use the tool to generate a schema file out of sample files in new-line delimited JSON format. The tool will scan the provided logs and print the inferred schema to stdout.
For example, to infer the schema of logs sample_logs.jsonl and output to schema.yml, use:
$ ./pantherlog infer sample_logs.jsonl > schema.yml
Note that YAML keys and values are case sensitive.
WARNING: The tool has the following limitations:
  • It will identify a string as a timestamp, only if the string is in RFC3339 format. Make sure to review the schema after it is generated by the tool
    and identify fields that should be of type timestamp instead.
  • It will not mark any timestamp field as isEventTime:true. Make sure to select the appropriate timestamp field and mark it as isEventTime:true.
    For more information regarding isEventTime:true see timestamp.
  • It is able to infer only 3 types of indicators: ip, aws_arn, url. Make sure to review the fields and add more indicators as appropriate.
Make sure to review the schema generated and edit it appropriately before deploying to your production environment!
The workflow of inferring a schema from sample logs

Trying out a schema

You can use the tool to validate a schema file and use it to parse log files. Note that the events in the log files need to be separated by new line. Processed logs are writen to stdout and errors to stderr.
For example, to parse logs in sample_logs.jsonl with the log schema in schema.yml, use:
$ ./pantherlog parse --path schema.yml --schemas Schema.Name sample_logs.jsonl
The tool can also accept input via stdin so it can be used in a pipeline:
$ cat sample_logs.jsonl | ./pantherlog parse --path schema.yml

Running tests for a schema

You can use the tool to run unit tests. You can define unit tests for your Custom Schema in YAML files. To run tests defined in a schema_tests.yml file for a custom schema defined in schema.yml use:
$ ./pantherlog test schema.yml schema_tests.yml
The first argument is a file or directory containing schema YAML files. The rest of the arguments are test files to run. If you don't specify any test files arguments, and the first argument is a directory, the tool will look for tests in YAML files with a _tests.yml suffix.
Below is an example of a test using the previous JSON log sample, testing against our inferred schema with the added flag isEventTime: true under the time field to ensure the correct timestamp:
schema_tests.yml
schema.yml
# Make sure to use camelCase when naming the schema or log type
name: Custom Log Test Name
logType: Custom.SampleLog.V1
input: |
{
"method": "GET",
"path": "/-/metrics",
"format": "html",
"controller": "MetricsController",
"action": "index",
"status": 200,
"params": [],
"remote_ip": "1.1.1.1",
"user_id": null,
"username": null,
"ua": null,
"queue_duration_s": null,
"correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
"cpu_s": 0.05,
"db_duration_s": 0,
"view_duration_s": 0.00039,
"duration_s": 0.0459,
"tag": "test",
"time": "2019-11-14T13:12:46.156Z"
}
result: |
{
"action": "index",
"controller": "MetricsController",
"correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
"cpu_s": 0.05,
"db_duration_s": 0,
"duration_s": 0.0459,
"format": "html",
"method": "GET",
"path": "/-/metrics",
"remote_ip": "1.1.1.1",
"status": 200,
"tag": "test",
"time": "2019-11-14T13:12:46.156Z",
"view_duration_s": 0.00039,
"p_log_type": "Custom.SampleLog.V1",
"p_row_id": "acde48001122a480ca9eda991001",
"p_event_time": "2019-11-14T13:12:46.156Z",
"p_parse_time": "2022-04-04T16:12:41.059224Z",
"p_any_ip_addresses": [
"1.1.1.1"
]
}
version: 0
schema: Custom.SampleLog.V1
fields:
- name: action
required: true
type: string
- name: controller
required: true
type: string
- name: correlation_id
required: true
type: string
- name: cpu_s
required: true
type: float
- name: db_duration_s
required: true
type: bigint
- name: duration_s
required: true
type: float
- name: format
required: true
type: string
- name: method
required: true
type: string
- name: path
required: true
type: string
- name: remote_ip
required: true
type: string
indicators:
- ip
- name: status
required: true
type: bigint
- name: tag
required: false
type: string
- name: time
required: true
type: timestamp
timeFormat: rfc3339
isEventTime: true
- name: view_duration_s
required: true
type: float

Uploading log schemas with the Panther Analysis Tool

If you choose to maintain your log schemas outside of Panther, for example in order to keep them under version control and review changes before updating, you can upload the YAML files programmatically with the Panther Analysis Tool.
The uploader command receives a base path as an argument and then proceeds to recursively discover all files with extensions .yml and .yaml.
It is recommended to keep schema files separately from other unrelated files, otherwise you may notice several unrelated errors for attempting to upload invalid schema files.
panther_analysis_tool update-custom-schemas --path ./schemas
The uploader will check if an existing schema exists and proceed with the update or create a new one if no matching schema name is found.
The schemafield must always be defined in the YAML file and be consistent with the existing schema name for an update to succeed. For an example see here.
The uploaded files are validated with the same criteria as Web UI updates.

Troubleshooting Custom Logs

Visit the Panther Knowledge Base to view articles about custom log sources that answer frequently asked questions and help you resolve common errors and issues.