Custom Logs

Define, write, and manage custom schemas

Overview

Panther allows you to define your own custom schemas. You can ingest custom logs into Panther via a Data Transport, and your custom schema will then normalize and classify the data.

This page explains how to define, write, and manage custom schemas, as well as how to upload schemas with Panther Analysis Tool (PAT). For information on how to use pantherlog to work with custom schemas, please see pantherlog CLI tool.

Custom schemas are identified by a Custom. prefix in their name and can be used wherever a natively supported log type is used:

Log ingestion
- You can onboard custom logs through a Data Transport (S3, SQS, Google Cloud Storage, CloudWatch Logs, or Google Cloud Pub/Sub.)
Detections
- You can write Rules for custom schemas.
Investigations
- You can query the data in Indicator Search and in Data Explorer. Panther will create a new table for the custom schema once you onboard a source that uses it.

How to define a custom schema

Panther supports JSON data formats and CSV with or without headers for custom log types. For inferring schemas, Panther does not support CSV without headers.

You can define a schema via the following methods:

Inferring from sample logs
- In the Panther Console, as described below
- Using the pantherlog CLI tool
Inferring from S3 data received in Panther
Inferring from historical S3 data
Manual creation

Open the tabs below for instructions.

Generating a schema from sample logs

You can generate a schema by uploading sample logs into the Panther Console. If you'd like to use the command line instead, follow the instructions on using the pantherlog CLI tool here.

To get started, follow these steps:

Log in to your Panther Console.
On the left sidebar, navigate to Configure > Schemas.
At the top right of the page next to the search bar, click Create New.
Enter a Schema ID, Description, and Reference URL.
- The Description is meant for content about the table, while the Reference URL can be used to link to internal resources.
Optionally enable Field Discovery by clicking its toggle ON. Learn more in Enabling field discovery.
Field discovery is in open beta starting with Panther version 1.77, and is available to all customers. Please share any bug reports and feature requests with your Panther support team.
Scroll to the bottom of the page where you'll find the option to upload sample log files.
Upload a sample set of logs: Drag a file from your computer over the "Infer schema from sample logs" box or click Select file and choose the log file. Note that Panther does not support CSV without headers for inferring schemas.
- After uploading a file, Panther will display the raw logs in the UI. You can expand the log lines to view the entire raw log. Note that if you add another sample set, it will override the previously-uploaded sample.
Select the appropriate Stream Type (view examples for each type here).
- Lines: Events are separated by a new line character.
- JSON: Events are in JSON format.
- JSON Array: Events are inside an array of JSON objects.
- CloudWatch Logs: Events came from CloudWatch Logs.
Click Infer Schema
- Panther will begin to infer a schema from the raw sample logs.
- Panther will attempt to infer multiple timestamp formats.
- Once the schema is generated, it will appear in the schema editor box above the raw logs.
To ensure the schema works properly against the sample logs you uploaded and against any changes you make to the schema, click Validate & Test Schema.
- This test will validate that the syntax of your schema is correct and that the log samples you have uploaded into Panther are successfully matching against the schema. You should see the results appear below the schema editor box.
- All successfully matched logs will appear under Matched; each log will display the column, field, and JSON view.
- All unsuccessfully matched logs will appear under Unmatched; each log will display the error message and the raw log.
Click Save to publish the schema.

Panther will infer from all logs uploaded, but will only display up to 100 logs to ensure fast response time when generating a schema.

Inferring a custom schema from S3 data received in Panther

You can generate and publish a schema for a custom log source from live data streaming from an S3 bucket into Panther. You will first view your S3 data in Panther, then infer a schema, then test the schema.

View raw S3 data

After onboarding your S3 bucket into Panther, you can view raw data coming into Panther and infer a schema from it:

Follow the instructions to onboard an S3 bucket onto Panther without having a schema in place.
While viewing your log source's Overview tab, scroll down to the Attach a schema to start classifying data section.
Choose from the following options:
- I want to add an existing schema: Choose this option if you already created a schema and you know the S3 prefix you want Panther to read logs from. Click Start in the tile.
  - You will see a S3 Prefixes & Schemas popup modal:
- I want to generate a schema from raw events: Select this option to generate a schema from live data in this bucket and define which prefixes you want Panther to read logs from. Click Start in the tile.
  - Note that you may need to wait up to 15 minutes for data to start streaming into Panther.
  - On the page you are directed to, you can view the raw data Panther has received at the bottom of the screen:
    This data is displayed from data-archiver, a Panther-managed S3 bucket that retains raw logs for up to 15 days for every S3 log source.
    Only raw log events that were placed in the S3 bucket after you configured the source in Panther will be visible, even if you've set the timespan to look further back.
    If your raw events are JSON-formatted, you can view them as JSON by clicking View JSON in the left-hand column.

Infer a schema from raw data

If you chose to I want to generate a schema from raw events in the previous section, now you can infer a schema.

Once you see data populating in Raw Events, you can filter the events you'd like to infer a schema from by using the string Search, S3 Prefix, Excluded Prefix, and/or Time Period filters at the top of the Raw Events section.
Click Infer Schema to generate a schema.
On the Infer New Schema modal that pops up, enter the following:
- New Schema Name: The name of the schema that will map to the table in the data lake once the schema is published.
  - The name will always start with Custom. and must have a capital letter after.
- S3 Prefix: Use an existing prefix that was set up prior to inferring the schema or a new prefix.
  - The prefix you choose will filter data from the corresponding prefix in the S3 bucket to the schema you've inferred.
  - If you don't need to specify a specific prefix, you can leave this field empty to use the catch-all prefix that is called *.
Click Infer Schema.
- At the top of the page, you will see '<schema name>' was successfully inferred.
  - Click Done.
- The schema will then be placed in a Draft mode until you're ready to publish to production after testing.
Review the schema and its fields by clicking its name.
- Since the schema is in Draft, you can change, remove, or add fields as needed.

Test the schema with raw data

Once your schemas and prefixes are defined, you can proceed to testing the schema configuration against raw data.

In the Test Schemas section at the top of the screen, click Run Test.
On the Test Schemas modal that pops up, select the Time Period you would like to test your schema against, then click Start Test.
- Depending on the time range and amount of data, the test may take a few minutes to complete.
- Once the test is started, the results appear with the amount of matched and unmatched events.
  - Matched Events represent the number of events that would successfully classify against the schema configuration.
  - Unmatched Events represent the number of events that would not classify against the schema.
If there are Unmatched Events, inspect the errors and the JSON to decipher what caused the failures.
- Click Back to Schemas, make changes as needed, and test the schema again.
Click Back to Schemas.
In the upper right corner, click Save.
- The inferred schema is now attached to your log source.

Inferring custom schemas from historical S3 data

You can infer and save one or multiple schemas for a custom S3 log source from historical data in your S3 bucket (i.e., data that was added to the bucket before it was onboarded as a log source in Panther).

Prerequisite: Onboard your S3 bucket to Panther

Follow the instructions to onboard an S3 bucket onto Panther without having a schema in place.
- If you have onboarded the S3 source with a custom IAM role, that role must have the ListBucket permission.

Step 1: View the S3 bucket structure in Panther

After creating your S3 bucket source in Panther, you can view your S3 bucket's structure and data in the Panther Console:

In the Panther Console, navigate to Configure > Log Sources. Click into your S3 log source.
In the log source's Overview tab, scroll down to the Attach a Schema to start classifying the data section.
On the right side of the I want to generate a schema from bucket data tile, click Start.
- You will be redirected to a folder inspection of your S3 bucket. Here, you can view and navigate through all folders and objects in the S3 bucke.
- Alternatively, you can access the folder inspection of your S3 bucket via the success page after onboarding your S3 source in Panther. From that page, click Attach or Infer Schemas.

Step 2: Navigate through your data

While viewing the folder inspection, click an object.
- A preview window will appear, displaying a preview of its events:

In Panther, an S3 object is highlighted. A pop-over window is displaying a preview of its events.

If the events fail to render correctly (either generating an error or displaying events improperly), it's possible the wrong stream type has been chosen for the S3 bucket source. If this is the case, click Selected Logs Format is n:

On the source's folder selection view in the Panther Console, the option to select a stream type appears at the top.

Step 3: Indicate if each folder has existing schema or a new one should be inferred

After reviewing what's included in your bucket, you can determine if one or multiple schemas is necessary to represent all of the bucket's data. Next, you can select folders that include data with distinct structures and either infer a new schema, or assign an existing one.

Determine whether one or more schemas will need to be inferred from the data in your S3 bucket.
- If all data in the S3 bucket is of the same structure (and therefore can be represented by one schema), you can leave the default Infer New Schema option selected on the bucket level. This generates a single schema for all data in the bucket.
- If the S3 bucket includes data that need to be classified in multiple schemas, follow the steps below for each folder in the bucket:
  1. Select a folder and click Include.
    Alternatively, if there is a folder or subfolder that you do not want Panther to process, select it and click Exclude.
  2. If you have an existing schema that matches the data, click the Schema dropdown on the right side of the row, then select the schema:
    By default, each newly included folder has the Infer New Schema option selected.
Click Infer n Schemas.

Step 4: Wait for schemas to be inferred

The schema inference process may take up to 15 minutes. You can leave this page while the process completes. You can also stop this process early, and keep the schema(s) inferred during the time that the process ran.

The source page in Panther shows the schema inference details, including an infer skipped and the number of events processed.

Step 5: Review the results

After the inference process is complete, you can view the resulting schemas and the number of events that were used during each schema's inference. You can also validate how each schema parses raw events.

Click the play icon on the right side of each row.
Click the Events tab to see the raw and normalized events.
Click the Schema tab to see the generated schema.

Step 6: Name the schema(s) and save source

Before saving the source, name each of the newly inferred schemas with a unique name by clicking Add name.

After all new schemas have been named, you will be able to click Save Source in the upper right corner.

Writing schemas

See the tabs below for instructions on writing schemas for JSON logs and for text logs.

Note that you can use the pantherlog CLI tool to generate your Log Schema.

Writing a schema for JSON logs

To parse log files where each line is JSON you have to define a log schema that describes the structure of each log entry.

You can edit the YAML specifications directly in the Panther Console or they can be prepared offline in your editor/IDE of choice. For more information on the structure and fields in a Log Schema, see the Log Schema Reference.

In the example schemas below, the first tab displays the JSON log structure and the second tab shows the Log Schema.

Note: Please leverage the Minified JSON Log Example when using the pantherlog tool or generating a schema within the Panther Console.

{
  "method": "GET",
  "path": "/-/metrics",
  "format": "html",
  "controller": "MetricsController",
  "action": "index",
  "status": 200,
  "params": [],
  "remote_ip": "1.1.1.1",
  "user_id": null,
  "username": null,
  "ua": null,
  "queue_duration_s": null,
  "correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
  "cpu_s": 0.05,
  "db_duration_s": 0,
  "view_duration_s": 0.00039,
  "duration_s": 0.0459,
  "tag": "test",
  "time": "2019-11-14T13:12:46.156Z"
}

Minified JSON log example:

{"method":"GET","path":"/-/metrics","format":"html","controller":"MetricsController","action":"index","status":200,"params":[],"remote_ip":"1.1.1.1","user_id":null,"username":null,"ua":null,"queue_duration_s":null,"correlation_id":"c01ce2c1-d9e3-4e69-bfa3-b27e50af0268","cpu_s":0.05,"db_duration_s":0,"view_duration_s":0.00039,"duration_s":0.0459,"tag":"test","time":"2019-11-14T13:12:46.156Z"}

version: 0
fields:
- name: time
  description: Event timestamp
  required: true
  type: timestamp
  timeFormats: 
   - rfc3339
  isEventTime: true
- name: method
  description: The HTTP method used for the request
  type: string
- name: path
  description: The path used for the request
  type: string
- name: remote_ip
  description: The remote IP address the request was made from
  type: string
  indicators: [ ip ] # the value will be appended to `p_any_ip_addresses` if it's a valid ip address
- name: duration_s
  description: The number of seconds the request took to complete
  type: float
- name: format
  description: Response format
  type: string
- name: user_id
  description: The id of the user that made the request
  type: string
- name: params
  type: array
  element:
    type: object
    fields:
    - name: key
      description: The name of a Query parameter
      type: string
    - name: value
      description: The value of a Query parameter
      type: string
- name: tag
  description: Tag for the request
  type: string
- name: ua
  description: UserAgent header
  type: string

Managing custom schemas

Editing a custom schema

Panther allows custom schemas to be edited. Specifically, you can perform the following actions:

Add new fields.
Rename or delete existing fields.
Edit, add, or remove all properties of existing fields.
Modify the parser configuration to fix bugs or add new patterns.
Archive or unarchive the schema.
Enable or disable automatic field discovery.

Note: After editing a field's type, any newly ingested data will match the new type while any previously ingested data will retain its type.

To edit a custom schema:

Navigate to your custom schema's details page in the Panther Console.
Click Edit in the upper right corner of the details page.
Modify the YAML.
- Click Diff View in the upper right corner of the text editor to see the additions, edits, and subtractions via the code editor. It also includes the ability to copy or revert deleted lines.
Click Update to submit your change.

Click Validate Syntax to check the YAML for structural compliance. Note that the rules will only be checked after you click Update. The update will be rejected if the rules are not followed.

Editing schema fields might require updates to related detections and saved queries. Click on the related entities in the alert banner displayed above the schema editor to view, update, and test the list of affected detections and saved queries.

A banner message says that editing schemas might require updates to related detections and saved queries. The message links to a list of detections and queries to review and test.

Query implications

Queries will work across changes to a Type provided the query does not use a function or operator which requires a field type that is not castable across Types.

Good example: The Type is edited from string to int where all existing values are numeric (i.e. "1"). A query using the function sum aggregates old and new values together.
Bad example: The Type is edited from string to int where some of the existing values are non-numeric (i.e. "apples"). A query using the function sum excludes values that are non-numeric.

Query castability table

This table shows which Types can be cast as each Type when running a query. Schema editing allows any Type to be changed to another Type.

Type From -> To

boolean

string

int

bigint

float

timestamp

boolean

same

yes

string

yes

same

numbers only

int

yes

same

yes

numbers only

bigint

yes

same

yes

numbers only

float

yes

same

numbers only

timestamp

yes

same

Archiving and unarchiving a custom schema

You can archive and unarchive custom schemas in Panther. You might choose to archive a schema if it's no longer used to ingest data, and you do not want it to appear as an option in various dropdown selectors throughout Panther. In order to archive a schema, it must not be in use by any log sources. Schemas that have been archived still exist indefinitely; it is not possible to permanently delete a schema.

Archiving a schema does not affect any data ingested using that schema already stored in the data lake—it is still queryable using Data Explorer and Indicator Search. By default, archived schemas are not shown in the schema list view (visible on Configure > Schemas), but can be shown by modifying Status, within Filters, in the upper right corner. In the Data Explorer, tables of archived schemas are not shown under Tables.

Attempting to create a new schema with the same name as an archived schema will result in a name conflict, and prompt you to instead unarchive and edit the existing schema.

To archive or unarchive a custom schema:

In the Panther Console, navigate to Configure > Schemas.
- Locate the schema you'd like to archive or unarchive.
Click the three dots icon in the upper right corner of the tile, and select Archive or Unarchive.
- If you are archiving a schema and it is currently associated to one or more log sources, the confirmation modal will prompt you to first detach the schema. Once you have done so, click Refresh.
On the confirmation modal, click Continue.

Testing a custom schema

The "Test Schema against sample logs" feature found on the Schema Edit page in the Panther Console supports Lines, CSV (with or without headers), JSON, JSON Array, and CloudWatch Logs. See Stream Types for examples.

Additionally, the above log formats can be compressed using the following formats:

gzip
zstd (without dictionary)

Multi-line logs are supported for JSON and JSONArray formats.

Need to validate that a custom schema will work against your logs? You can test sample logs by following this process:

In the Panther Console, go to Configure > Schemas.
Click on a custom schema.
In the schema details page, scroll to the bottom of the page where you'll be able to upload logs.

In the Panther Console below a schema, there is a section labeled "Test a schema against sample logs." In that section, there is an option to drag and drop in a file or to select a file to upload.

Enabling field discovery

Field discovery is in open beta starting with Panther version 1.77, and is available to all customers. Please share any bug reports and feature requests with your Panther support team.

Log source schemas in Panther define the log event fields that will be stored in Panther. When field discovery is enabled, data from fields in incoming log events that are not defined in the corresponding schema will not be dropped—instead, the fields will be identified, and the data will be stored. This means you can subsequently query data from these fields, and write detections referencing them.

Field discovery can only be enabled for JSON and CSV with header log schemas. Field discovery is currently only available for custom schemas, not Panther-managed ones.

In the Panther Console, the page for a "Custom.HTTPProxy" schemas is shown. There are fields for Description, Reference URL, and Field Discovery. The Field Discovery toggle is set to ON.

Limitations

Field discovery currently has the following limitations:

It will discover a maximum of 200 top-level fields.
- There is no limitation on the number of overall fields or nested fields discovered.
Fields containing any special characters @,`$*&!%+/#~= will be ignored.
It cannot discover nested fields inside objects where the parent object is defined in the schema with the isEmbeddedJSON: true property.

Uploading log schemas with the Panther Analysis Tool

If you choose to maintain your log schemas outside of Panther, for example in order to keep them under version control and review changes before updating, you can upload the YAML files programmatically with the Panther Analysis Tool.

The uploader command receives a base path as an argument and then proceeds to recursively discover all files with extensions .yml and .yaml.

It is recommended to keep schema files separately from other unrelated files, otherwise you may notice several unrelated errors for attempting to upload invalid schema files.

panther_analysis_tool update-custom-schemas --path ./schemas

The uploader will check if an existing schema exists and proceed with the update or create a new one if no matching schema name is found.

The schemafield must always be defined in the YAML file and be consistent with the existing schema name for an update to succeed. For a list of all available CI/CD fields see our Log Schema Reference.

The uploaded files are validated with the same criteria as Web UI updates.

Troubleshooting Custom Logs

Visit the Panther Knowledge Base to view articles about custom log sources that answer frequently asked questions and help you resolve common errors and issues.

PreviousZoom Logs NextLog Schema Reference

Last updated 2 years ago

Was this helpful?

Custom Logs

Overview

How to define a custom schema

Generating a schema from sample logs

Inferring a custom schema from S3 data received in Panther

View raw S3 data

Infer a schema from raw data

Test the schema with raw data

Inferring custom schemas from historical S3 data

Prerequisite: Onboard your S3 bucket to Panther

Step 1: View the S3 bucket structure in Panther

Step 2: Navigate through your data

Step 3: Indicate if each folder has existing schema or a new one should be inferred

Step 4: Wait for schemas to be inferred

Step 5: Review the results

Step 6: Name the schema(s) and save source

Adding a Custom Schema manually

Writing schemas

Writing a schema for JSON logs

Writing a schema for text logs

Managing custom schemas

Editing a custom schema

Query implications

Query castability table

Archiving and unarchiving a custom schema

Testing a custom schema

Enabling field discovery

Limitations

Uploading log schemas with the Panther Analysis Tool

Troubleshooting Custom Logs

hashtagOverview

hashtagHow to define a custom schema

hashtagGenerating a schema from sample logs

hashtagInferring a custom schema from S3 data received in Panther

hashtagView raw S3 data

hashtagInfer a schema from raw data

hashtagTest the schema with raw data

hashtagInferring custom schemas from historical S3 data

hashtagPrerequisite: Onboard your S3 bucket to Panther

hashtagStep 1: View the S3 bucket structure in Panther

hashtagStep 2: Navigate through your data

hashtagStep 3: Indicate if each folder has existing schema or a new one should be inferred

hashtagStep 4: Wait for schemas to be inferred

hashtagStep 5: Review the results

hashtagStep 6: Name the schema(s) and save source

hashtagAdding a Custom Schema manually

hashtagWriting schemas

hashtagWriting a schema for JSON logs

hashtagWriting a schema for text logs

hashtagManaging custom schemas

hashtagEditing a custom schema

hashtagUpdate related detections and saved queries

hashtagQuery implications

hashtagQuery castability table

hashtagArchiving and unarchiving a custom schema

hashtagTesting a custom schema

hashtagEnabling field discovery

hashtagLimitations

hashtagUploading log schemas with the Panther Analysis Tool

hashtagTroubleshooting Custom Logs

Overview

How to define a custom schema

Generating a schema from sample logs

Inferring a custom schema from S3 data received in Panther

View raw S3 data

Infer a schema from raw data

Test the schema with raw data

Inferring custom schemas from historical S3 data

Prerequisite: Onboard your S3 bucket to Panther

Step 1: View the S3 bucket structure in Panther

Step 2: Navigate through your data

Step 3: Indicate if each folder has existing schema or a new one should be inferred

Step 4: Wait for schemas to be inferred

Step 5: Review the results

Step 6: Name the schema(s) and save source

Adding a Custom Schema manually

Writing schemas

Writing a schema for JSON logs

Writing a schema for text logs

Managing custom schemas

Editing a custom schema

Update related detections and saved queries

Query implications

Query castability table

Archiving and unarchiving a custom schema

Testing a custom schema

Enabling field discovery

Limitations

Uploading log schemas with the Panther Analysis Tool

Troubleshooting Custom Logs