Custom Logs

Overview

Panther allows users to define their own log types by creating a Custom Schema. You can generate a Custom Schema from live data in S3, our pantherlog CLI tool, or sample data uploaded into the Panther Console.
Custom Schemas are identified by a Custom. prefix in their name and can be used wherever a 'native' Log Type is used:
  • You can use a Custom Schema when onboarding data through S3, SQS or GCS.
  • You can write Rules for Custom Schemas.
  • You can query the data in Data Explorer. Panther will create a new table for the Custom Schema once you onboard a source that uses it.
  • You can query the data through Indicator Search.
Currently, Panther supports CSV with headers and JSON data formats. There currently is no support for CSV without headers.

Generating and testing a data schema for a Custom Log from data in S3

You can generate a schema for a custom log from live data streaming from an S3 bucket and then test that schema before publishing.

View raw data from S3

After onboarding your S3 bucket onto Panther, you can view raw data coming into Panther before processing. To view the raw data and kick off schema inference/testing, proceed with the following:
  1. 1.
    Follow the instructions to onboard an S3 bucket onto Panther without having a schema in place.
    • Skip the step where you list schemas and prefixes in the first page of the S3 onboarding wizard.
    • You'll have the opportunity to add schemas and prefixes after the S3 bucket is onboarded.
  2. 2.
    While viewing your log source, click Schemas on the left:
    The image above shows the name of the log source, S3-Test-Holding. Below the source name, there are links to the Overview, Health, and Schemas sections.
  3. 3.
    Follow the options on the screen to configure schemas. Choose from the following options:
    • I already know schemas and prefixes: Choose this option if you already created a schema and you know the S3 prefix you want Panther to read logs from. Click Start in the tile.
    • I want to generate a schema from raw events: Select this option to generate a schema from live data in this bucket and define which prefixes you want Panther to read logs from. Click Start in the tile.
    • Create new schema: You can also create your own schema from the Schemas page and return here to attach it. Click Create New Schema below the tile options.
      The image above displays the options for configuring a schema on a log source in the Panther Console. Note: You may need to wait up to 15 minutes for data to start streaming into Panther.
On the schema inference and testing workflow page, you can view the raw data that Panther has received at the bottom of the screen:
  • This data is displayed from data-archiver, a Panther-managed S3 bucket that retains raw logs for up to 30 days for every S3 log source.
  • If you still do not see data after 15 minutes, ensure that the time picker is set to the appropriate time range that corresponds with the timestamps on the events coming into Panther.

Infer a schema from raw data

Using raw live data coming from S3, you can infer schemas with just a few steps.
  1. 1.
    Once you see data populating in Raw Events, you can filter the raw events you'd like to infer a schema for by using the time, prefix, or string filter. Filter the raw events by going to the top of the raw events table and setting the parameters there.
  2. 2.
    Click Infer Schema to generate a schema.
  3. 3.
    On the Infer New Schema modal that pops up, enter the following:
    • Schema Name: The name of the schema that will map to the table in the data lake once the schema is published.
      • The name will always start with Custom. and must have a capital letter after.
    • Prefix: Use an existing prefix that was set up prior to inferring the schema or a new prefix.
      • The prefix you choose will filter data from the corresponding prefix in the S3 bucket to the schema you've inferred.
      • If you don't need to specify a specific prefix, you can leave this field empty to use the catch-all prefix that is called *.
  4. 4.
    Click Infer Schema.
    • The schema will then be placed in a Draft mode until you're ready to publish to production after testing.
  5. 5.
    Review the schema and its fields by going to the Schemas section and clicking on the schema name.
    • You'll see the name you attributed to the schema with a draft label after the schema is inferred.
    • Since the schema is in Draft, you can change, remove, or add fields as needed.

Test schema with raw data

Once your schemas and prefixes are defined, you can proceed to testing the schema configuration against raw data.
  1. 1.
    In the Test Schemas section at the top of the screen, click on the Run Test button.
  2. 2.
    On the Test Schemas modal that pops up, select the Time Period you would like to test your schemas against and click on the Start Test button.
    • Depending on the time range and amount of data, the test may take a few minutes to complete.
    • Once the test is started, the results appear with the amount of matched and unmatched events.
      • Matched events represent the number of events that would successfully classify against the schema configuration.
      • Unmatched events represent the number of events that would unsuccessfully classify.
  3. 3.
    Inspect the errors and the JSON to decipher what caused the failures.
  4. 4.
    Navigate back to the draft schema, make changes as needed, and test the schemas again.

Generating a schema for a Custom Log from sample logs

You can generate a schema for a Custom Log by uploading sample logs into the Panther UI. To get started, follow these steps:
  1. 1.
    Log in to your Panther account.
  2. 2.
    On the left sidebar, navigate to Data > Schemas.
  3. 3.
    At the top right of the page next to the search bar, click Create New.
  4. 4.
    On the New Data Schema page, enter a Schema ID, Description, and Reference URL.
    • The Description is meant for content about the table, while the Reference URL can be used to link to internal resources.
  5. 5.
    Scroll to the bottom of the page where you'll find the option to upload sample log files.
  6. 6.
    Upload a sample set of logs by dragging a file from your computer over the "Infer schema from sample logs" box or by clicking Select file and choosing the log file. (Reminder: Panther version 1.25 supports JSON files only, version 1.26 adds support for CSV files)
    • After uploading a file, Panther will display the raw logs in the UI. You can expand the log lines to view the entire raw log. Note that if you add another sample set, it will override the previously-uploaded sample.
  7. 7.
    Click Infer Schema from All Logs.
    • Panther will begin to infer a schema from the raw sample logs. Depending on the number of logs uploaded, this could take some time (e.g. 30 seconds for 100 logs).
    • Once the schema is generated, it will appear in the schema editor box above the raw logs.
  8. 8.
    To ensure the schema works properly against the sample logs you uploaded and against any changes you make to the schema, click Validate & Test Schema.
    • This test will validate that the syntax of your schema is correct and that the log samples you have uploaded into Panther are successfully matching against the schema. You should see the results appear below the schema editor box.
    • All successfully matched logs will appear under Matched; each log will display the column, field, and JSON view.
    • All unsuccessfully matched logs will appear under Unmatched; each log will display the error message and the raw log.
  9. 9.
    Click Save to publish the schema.
Note that the UI will only display up to 100 logs. This doesn't mean Panther can only infer from 100 logs, Panther will infer from all logs uploaded, this is just a performance characteristic to ensure fast response time when generating a schema.

Adding a Custom Log Manually

To add a Custom Log manually:
  1. 1.
    In the Panther Console, navigate to Data > Schemas.
  2. 2.
    Click New in the upper right corner.
  3. 3.
    Enter a name for the Custom Log (ie Custom.SampleAPI) and write or paste your YAML Log Schema definition.
  4. 4.
    Click Validate Syntax at the bottom to verify your schema contains no errors.
    • Note that syntax validation only checks the syntax of the Log Schema. It can still fail to save due to name conflicts.
  5. 5.
    Click Save.
You can now navigate to Log Analysis > Sources and add a new source or modify an existing one to use the new Custom.SampleAPI _Log Type. Once Panther receives events from this Source, it will process the logs and store the Log Events to the custom_sampleapi table.
You can also now write Rules to match against these logs and query them using the Data Explorer.

Editing a Custom Log Type

Panther allows limited editing of a Custom Log. Specifically:
  • You can modify the parser configuration to fix bugs or add new patterns.
  • You can add new fields to the schema.
  • You can edit, add, or remove all properties of existing fields except the type.
  • You cannot rename existing fields.
  • You cannot delete existing fields (doing so would allow renaming in two steps).
  • You cannot change the type of an existing field (this includes the element type for array fields).
To edit a Custom Log:
  1. 1.
    Navigate to your Custom Log Type's details page.
  2. 2.
    Click Edit in the details page.
  3. 3.
    Modify the YAML.
  4. 4.
    Click Update to submit your change.
Validate Syntax can check the YAML for structural compliance, but the rules described above can only be checked on Update. The update will be rejected if the rules are not followed.

Disabling a Custom Log

A Custom Log can be disabled if no source is using it.
A disabled Custom Log is removed from the listing and its tables are hidden from the Data Explorer view.
Disabling a Custom Log does not affect any data already stored in the data lake. All data is still queryable through Data Explorer or Indicator Search. Trying to add a log with the same name at a later time, will result in failure due to the name conflict.
To disable a Custom Log:
  1. 1.
    Navigate to the details page of the Custom Log type.
  2. 2.
    Click the Enable toggle to set it to Disabled.

Testing a Panther-managed Log

This feature is available in versions 1.26 and above.
Need to validate that a Panther-managed schema will work against your logs? You can test sample logs against the Panther-managed schema similarly to testing logs against a custom schema/log type (as described above). Follow the steps below:
  1. 1.
    Visit the Schema page under the Data tab.
  2. 2.
    Click on a schema labeled as Panther-managed.
  3. 3.
    Once in the schema details page, scroll to the bottom of the page where you'll be able to upload logs.

Writing a log schema for JSON logs

You can make use of our pantherlog CLI tool to help you generate your Log Schema
To parse log files where each line is JSON you have to define a Log Schema that describes the structure of each log entry.
In the example schemas below, the first tab displays the JSON log structure and the second tab shows the Log Schema.
Note: Please leverage the Minified JSON Log Example when using the pantherlog tool or generating a schema within the Panther Console.
JSON Log Example
Log Schema Example
Minified JSON Log Example
{
"method": "GET",
"path": "/-/metrics",
"format": "html",
"controller": "MetricsController",
"action": "index",
"status": 200,
"params": [],
"remote_ip": "1.1.1.1",
"user_id": null,
"username": null,
"ua": null,
"queue_duration_s": null,
"correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
"cpu_s": 0.05,
"db_duration_s": 0,
"view_duration_s": 0.00039,
"duration_s": 0.0459,
"tag": "test",
"time": "2019-11-14T13:12:46.156Z"
}
version: 0
fields:
- name: time
description: Event timestamp
required: true
type: timestamp
timeFormat: rfc3339
isEventTime: true
- name: method
description: The HTTP method used for the request
type: string
- name: path
description: The path used for the request
type: string
- name: remote_ip
description: The remote IP address the request was made from
type: string
indicators: [ ip ] # the value will be appended to `p_any_ip_addresses` if it's a valid ip address
- name: duration_s
description: The number of seconds the request took to complete
type: float
- name: format
description: Response format
type: string
- name: user_id
description: The id of the user that made the request
type: string
- name: params
type: array
element:
type: object
fields:
- name: key
description: The name of a Query parameter
type: string
- name: value
description: The value of a Query parameter
type: string
- name: tag
description: Tag for the request
type: string
- name: ua
description: UserAgent header
type: string
{"method":"GET","path":"/-/metrics","format":"html","controller":"MetricsController","action":"index","status":200,"params":[],"remote_ip":"1.1.1.1","user_id":null,"username":null,"ua":null,"queue_duration_s":null,"correlation_id":"c01ce2c1-d9e3-4e69-bfa3-b27e50af0268","cpu_s":0.05,"db_duration_s":0,"view_duration_s":0.00039,"duration_s":0.0459,"tag":"test","time":"2019-11-14T13:12:46.156Z"}
You can edit the YAML specifications directly in the Panther Console or they can be prepared offline in your editor/IDE of choice. For more information on the structure and fields in a Log Schema, see the Log Schema Reference.

Writing a log schema for text logs

Panther handles logs that are not structured as JSON by using a 'parser' that translates each log line into key/value pairs and feeds it as JSON to the rest of the pipeline. You can define a text parser using the parser field of the Log Schema. Panther provides the following parsers for non-JSON formatted logs:
Name
Description
fastmatch
Match each line of text against one or more simple patterns
regex
Use regular expression patterns to handle more complex matching such as conditional fields, case-insensitive matching etc
csv
Treat log files as CSV mapping colunm names to field names

Pantherlog CLI

Panther provides a simple CLI tool to help work with Custom Logs feature. The tool is called pantherlog and an executable for each platform is provided with the release. The executables can be downloaded from the panther-community S3 bucket, see more details on the operations help page.

Generating a schema from JSON samples

You can use the tool to generate a schema file out of sample files in new-line delimited JSON format. The tool will scan the provided logs and print the inferred schema to stdout.
For example, to infer the schema of logs sample_logs.jsonl and output to schema.yml, use:
$ ./pantherlog infer sample_logs.jsonl > schema.yml
Note that YAML keys and values are case sensitive.
WARNING: The tool has the following limitations:
  • It will identify a string as a timestamp, only if the string is in RFC3339 format. Make sure to review the schema after it is generated by the tool
    and identify fields that should be of type timestamp instead.
  • It will not mark any timestamp field as isEventTime:true. Make sure to select the appropriate timestamp field and mark it as isEventTime:true.
    For more information regarding isEventTime:true see timestamp.
  • It is able to infer only 3 types of indicators: ip, aws_arn, url. Make sure to review the fields and add more indicators as appropriate.
Make sure to review the schema generated and edit it appropriately before deploying to your production environment!
Pantherlog CLI example

Trying out a schema

You can use the tool to validate a schema file and use it to parse log files. Note that the events in the log files need to be separated by new line. Processed logs are writen to stdout and errors to stderr.
For example, to parse logs in sample_logs.jsonl with the log schema in schema.yml, use:
$ ./pantherlog parse --path schema.yml --schemas Schema.Name sample_logs.jsonl
The tool can also accept input via stdin so it can be used in a pipeline:
$ cat sample_logs.jsonl | ./pantherlog parse --path schema.yml

Running tests for a schema

You can use the tool to run unit tests. You can define unit tests for your Custom Schema in YAML files. To run tests defined in a schema_tests.yml file for a custom schema defined in schema.yml use:
$ ./pantherlog test schema.yml schema_tests.yml
The first argument is a file or directory containing schema YAML files. The rest of the arguments are test files to run. If you don't specify any test files arguments, and the first argument is a directory, the tool will look for tests in YAML files with a _tests.yml suffix.
Below is an example of a test using the previous JSON log sample, testing against our inferred schema with the added flag isEventTime: true under the time field to ensure the correct timestamp:
schema_tests.yml
schema.yml
# Make sure to use camelCase when naming the schema or log type
name: Custom Log Test Name
logType: Custom.SampleLog.V1
input: |
{
"method": "GET",
"path": "/-/metrics",
"format": "html",
"controller": "MetricsController",
"action": "index",
"status": 200,
"params": [],
"remote_ip": "1.1.1.1",
"user_id": null,
"username": null,
"ua": null,
"queue_duration_s": null,
"correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
"cpu_s": 0.05,
"db_duration_s": 0,
"view_duration_s": 0.00039,
"duration_s": 0.0459,
"tag": "test",
"time": "2019-11-14T13:12:46.156Z"
}
result: |
{
"action": "index",
"controller": "MetricsController",
"correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
"cpu_s": 0.05,
"db_duration_s": 0,
"duration_s": 0.0459,
"format": "html",
"method": "GET",
"path": "/-/metrics",
"remote_ip": "1.1.1.1",
"status": 200,
"tag": "test",
"time": "2019-11-14T13:12:46.156Z",
"view_duration_s": 0.00039,
"p_log_type": "Custom.SampleLog.V1",
"p_row_id": "acde48001122a480ca9eda991001",
"p_event_time": "2019-11-14T13:12:46.156Z",
"p_parse_time": "2022-04-04T16:12:41.059224Z",
"p_any_ip_addresses": [
"1.1.1.1"
]
}
version: 0
schema: Custom.SampleLog.V1
fields:
- name: action
required: true
type: string
- name: controller
required: true
type: string
- name: correlation_id
required: true
type: string
- name: cpu_s
required: true
type: float
- name: db_duration_s
required: true
type: bigint
- name: duration_s
required: true
type: float
- name: format
required: true
type: string
- name: method
required: true
type: string
- name: path
required: true
type: string
- name: remote_ip
required: true
type: string
indicators:
- ip
- name: status
required: true
type: bigint
- name: tag
required: false
type: string
- name: time
required: true
type: timestamp
timeFormat: rfc3339
isEventTime: true
- name: view_duration_s
required: true
type: float

Uploading log schemas with the Panther Analysis Tool

If you choose to maintain your log schemas outside of Panther, for example in order to keep them under version control and review changes before updating, you can upload the YAML files programmatically with the Panther Analysis Tool.
The uploader command receives a base path as an argument and then proceeds to recursively discover all files with extensions .yml and .yaml.
It is recommended to keep schema files separately from other unrelated files, otherwise you may notice several unrelated errors for attempting to upload invalid schema files.
panther_analysis_tool update-custom-schemas --path ./schemas
The uploader will check if an existing schema exists and proceed with the update or create a new one if no matching schema name is found.
The schemafield must always be defined in the YAML file and be consistent with the existing schema name for an update to succeed. For an example see here.
The uploaded files are validated with the same criteria as Web UI updates.