LogoLogo
Knowledge BaseCommunityRelease NotesRequest Demo
  • Overview
  • Quick Start
    • Onboarding Guide
  • Data Sources & Transports
    • Supported Logs
      • 1Password Logs
      • Apache Logs
      • AppOmni Logs
      • Asana Logs
      • Atlassian Logs
      • Auditd Logs
      • Auth0 Logs
      • AWS Logs
        • AWS ALB
        • AWS Aurora
        • AWS CloudFront
        • AWS CloudTrail
        • AWS CloudWatch
        • AWS Config
        • AWS EKS
        • AWS GuardDuty
        • AWS Security Hub
        • Amazon Security Lake
        • AWS S3
        • AWS Transit Gateway
        • AWS VPC
        • AWS WAF
      • Azure Monitor Logs
      • Bitwarden Logs
      • Box Logs
      • Carbon Black Logs
      • Cisco Umbrella Logs
      • Cloudflare Logs
      • CrowdStrike Logs
        • CrowdStrike Falcon Data Replicator
        • CrowdStrike Event Streams
      • Docker Logs
      • Dropbox Logs
      • Duo Security Logs
      • Envoy Logs
      • Fastly Logs
      • Fluentd Logs
      • GCP Logs
      • GitHub Logs
      • GitLab Logs
      • Google Workspace Logs
      • Heroku Logs
      • Jamf Pro Logs
      • Juniper Logs
      • Lacework Logs
        • Lacework Alert Channel Webhook
        • Lacework Export
      • Material Security Logs
      • Microsoft 365 Logs
      • Microsoft Entra ID Audit Logs
      • Microsoft Graph Logs
      • MongoDB Atlas Logs
      • Netskope Logs
      • Nginx Logs
      • Notion Logs
      • Okta Logs
      • OneLogin Logs
      • Orca Security Logs (Beta)
      • Osquery Logs
      • OSSEC Logs
      • Proofpoint Logs
      • Push Security Logs
      • Rapid7 Logs
      • Salesforce Logs
      • SentinelOne Logs
      • Slack Logs
      • Snowflake Audit Logs (Beta)
      • Snyk Logs
      • Sophos Logs
      • Sublime Security Logs
      • Suricata Logs
      • Sysdig Logs
      • Syslog Logs
      • Tailscale Logs
      • Teleport Logs
      • Tenable Vulnerability Management Logs
      • Thinkst Canary Logs
      • Tines Logs
      • Tracebit Logs
      • Windows Event Logs
      • Wiz Logs
      • Zeek Logs
      • Zendesk Logs
      • Zoom Logs
      • Zscaler Logs
        • Zscaler ZIA
        • Zscaler ZPA
    • Custom Logs
      • Log Schema Reference
      • Transformations
      • Script Log Parser (Beta)
      • Fastmatch Log Parser
      • Regex Log Parser
      • CSV Log Parser
    • Data Transports
      • HTTP Source
      • AWS Sources
        • S3 Source
        • CloudWatch Logs Source
        • SQS Source
          • SNS Source
        • EventBridge
      • Google Cloud Sources
        • Cloud Storage (GCS) Source
        • Pub/Sub Source
      • Azure Blob Storage Source
    • Monitoring Log Sources
    • Ingestion Filters
      • Raw Event Filters
      • Normalized Event Filters (Beta)
    • Data Pipeline Tools
      • Chronosphere Onboarding Guide
      • Cribl Onboarding Guide
      • Fluent Bit Onboarding Guide
        • Fluent Bit Configuration Examples
      • Fluentd Onboarding Guide
        • General log forwarding via Fluentd
        • MacOS System Logs to S3 via Fluentd
        • Syslog to S3 via Fluentd
        • Windows Event Logs to S3 via Fluentd (Legacy)
        • GCP Audit to S3 via Fluentd
      • Observo Onboarding Guide
      • Tarsal Onboarding Guide
    • Tech Partner Log Source Integrations
  • Detections
    • Using Panther-managed Detections
      • Detection Packs
    • Rules and Scheduled Rules
      • Writing Python Detections
        • Python Rule Caching
        • Data Models
        • Global Helper Functions
      • Modifying Detections with Inline Filters (Beta)
      • Derived Detections (Beta)
        • Using Derived Detections to Avoid Merge Conflicts
      • Using the Simple Detection Builder
      • Writing Simple Detections
        • Simple Detection Match Expression Reference
        • Simple Detection Error Codes
    • Correlation Rules (Beta)
      • Correlation Rule Reference
    • PyPanther Detections (Beta)
      • Creating PyPanther Detections
      • Registering, Testing, and Uploading PyPanther Detections
      • Managing PyPanther Detections in the Panther Console
      • PyPanther Detections Style Guide
      • pypanther Library Reference
      • Using the pypanther Command Line Tool
    • Signals
    • Policies
    • Testing
      • Data Replay (Beta)
    • Framework Mapping and MITRE ATT&CK® Matrix
  • Cloud Security Scanning
    • Cloud Resource Attributes
      • AWS
        • ACM Certificate
        • CloudFormation Stack
        • CloudWatch Log Group
        • CloudTrail
        • CloudTrail Meta
        • Config Recorder
        • Config Recorder Meta
        • DynamoDB Table
        • EC2 AMI
        • EC2 Instance
        • EC2 Network ACL
        • EC2 SecurityGroup
        • EC2 Volume
        • EC2 VPC
        • ECS Cluster
        • EKS Cluster
        • ELBV2 Application Load Balancer
        • GuardDuty Detector
        • GuardDuty Detector Meta
        • IAM Group
        • IAM Policy
        • IAM Role
        • IAM Root User
        • IAM User
        • KMS Key
        • Lambda Function
        • Password Policy
        • RDS Instance
        • Redshift Cluster
        • Route 53 Domains
        • Route 53 Hosted Zone
        • S3 Bucket
        • WAF Web ACL
  • Alerts & Destinations
    • Alert Destinations
      • Amazon SNS Destination
      • Amazon SQS Destination
      • Asana Destination
      • Blink Ops Destination
      • Custom Webhook Destination
      • Discord Destination
      • GitHub Destination
      • Google Pub/Sub Destination (Beta)
      • Incident.io Destination
      • Jira Cloud Destination
      • Jira Data Center Destination (Beta)
      • Microsoft Teams Destination
      • Mindflow Destination
      • OpsGenie Destination
      • PagerDuty Destination
      • Rapid7 Destination
      • ServiceNow Destination (Custom Webhook)
      • Slack Bot Destination
      • Slack Destination (Webhook)
      • Splunk Destination (Beta)
      • Tines Destination
      • Torq Destination
    • Assigning and Managing Alerts
      • Managing Alerts in Slack
    • Alert Runbooks
      • Panther-managed Policies Runbooks
        • AWS CloudTrail Is Enabled In All Regions
        • AWS CloudTrail Sending To CloudWatch Logs
        • AWS KMS CMK Key Rotation Is Enabled
        • AWS Application Load Balancer Has Web ACL
        • AWS Access Keys Are Used Every 90 Days
        • AWS Access Keys are Rotated Every 90 Days
        • AWS ACM Certificate Is Not Expired
        • AWS Access Keys not Created During Account Creation
        • AWS CloudTrail Has Log Validation Enabled
        • AWS CloudTrail S3 Bucket Has Access Logging Enabled
        • AWS CloudTrail Logs S3 Bucket Not Publicly Accessible
        • AWS Config Is Enabled for Global Resources
        • AWS DynamoDB Table Has Autoscaling Targets Configured
        • AWS DynamoDB Table Has Autoscaling Enabled
        • AWS DynamoDB Table Has Encryption Enabled
        • AWS EC2 AMI Launched on Approved Host
        • AWS EC2 AMI Launched on Approved Instance Type
        • AWS EC2 AMI Launched With Approved Tenancy
        • AWS EC2 Instance Has Detailed Monitoring Enabled
        • AWS EC2 Instance Is EBS Optimized
        • AWS EC2 Instance Running on Approved AMI
        • AWS EC2 Instance Running on Approved Instance Type
        • AWS EC2 Instance Running in Approved VPC
        • AWS EC2 Instance Running On Approved Host
        • AWS EC2 Instance Running With Approved Tenancy
        • AWS EC2 Instance Volumes Are Encrypted
        • AWS EC2 Volume Is Encrypted
        • AWS GuardDuty is Logging to a Master Account
        • AWS GuardDuty Is Enabled
        • AWS IAM Group Has Users
        • AWS IAM Policy Blocklist Is Respected
        • AWS IAM Policy Does Not Grant Full Administrative Privileges
        • AWS IAM Policy Is Not Assigned Directly To User
        • AWS IAM Policy Role Mapping Is Respected
        • AWS IAM User Has MFA Enabled
        • AWS IAM Password Used Every 90 Days
        • AWS Password Policy Enforces Complexity Guidelines
        • AWS Password Policy Enforces Password Age Limit Of 90 Days Or Less
        • AWS Password Policy Prevents Password Reuse
        • AWS RDS Instance Is Not Publicly Accessible
        • AWS RDS Instance Snapshots Are Not Publicly Accessible
        • AWS RDS Instance Has Storage Encrypted
        • AWS RDS Instance Has Backups Enabled
        • AWS RDS Instance Has High Availability Configured
        • AWS Redshift Cluster Allows Version Upgrades
        • AWS Redshift Cluster Has Encryption Enabled
        • AWS Redshift Cluster Has Logging Enabled
        • AWS Redshift Cluster Has Correct Preferred Maintenance Window
        • AWS Redshift Cluster Has Sufficient Snapshot Retention Period
        • AWS Resource Has Minimum Number of Tags
        • AWS Resource Has Required Tags
        • AWS Root Account Has MFA Enabled
        • AWS Root Account Does Not Have Access Keys
        • AWS S3 Bucket Name Has No Periods
        • AWS S3 Bucket Not Publicly Readable
        • AWS S3 Bucket Not Publicly Writeable
        • AWS S3 Bucket Policy Does Not Use Allow With Not Principal
        • AWS S3 Bucket Policy Enforces Secure Access
        • AWS S3 Bucket Policy Restricts Allowed Actions
        • AWS S3 Bucket Policy Restricts Principal
        • AWS S3 Bucket Has Versioning Enabled
        • AWS S3 Bucket Has Encryption Enabled
        • AWS S3 Bucket Lifecycle Configuration Expires Data
        • AWS S3 Bucket Has Logging Enabled
        • AWS S3 Bucket Has MFA Delete Enabled
        • AWS S3 Bucket Has Public Access Block Enabled
        • AWS Security Group Restricts Ingress On Administrative Ports
        • AWS VPC Default Security Group Restricts All Traffic
        • AWS VPC Flow Logging Enabled
        • AWS WAF Has Correct Rule Ordering
        • AWS CloudTrail Logs Encrypted Using KMS CMK
      • Panther-managed Rules Runbooks
        • AWS CloudTrail Modified
        • AWS Config Service Modified
        • AWS Console Login Failed
        • AWS Console Login Without MFA
        • AWS EC2 Gateway Modified
        • AWS EC2 Network ACL Modified
        • AWS EC2 Route Table Modified
        • AWS EC2 SecurityGroup Modified
        • AWS EC2 VPC Modified
        • AWS IAM Policy Modified
        • AWS KMS CMK Loss
        • AWS Root Activity
        • AWS S3 Bucket Policy Modified
        • AWS Unauthorized API Call
    • Tech Partner Alert Destination Integrations
  • Investigations & Search
    • Search
      • Search Filter Operators
    • Data Explorer
      • Data Explorer SQL Search Examples
        • CloudTrail logs queries
        • GitHub Audit logs queries
        • GuardDuty logs queries
        • Nginx and ALB Access logs queries
        • Okta logs queries
        • S3 Access logs queries
        • VPC logs queries
    • Visualization and Dashboards
      • Custom Dashboards (Beta)
      • Panther-Managed Dashboards
    • Standard Fields
    • Saved and Scheduled Searches
      • Templated Searches
        • Behavioral Analytics and Anomaly Detection Template Macros (Beta)
      • Scheduled Search Examples
    • Search History
    • Data Lakes
      • Snowflake
        • Snowflake Configuration for Optimal Search Performance
      • Athena
  • PantherFlow (Beta)
    • PantherFlow Quick Reference
    • PantherFlow Statements
    • PantherFlow Operators
      • Datatable Operator
      • Extend Operator
      • Join Operator
      • Limit Operator
      • Project Operator
      • Range Operator
      • Sort Operator
      • Search Operator
      • Summarize Operator
      • Union Operator
      • Visualize Operator
      • Where Operator
    • PantherFlow Data Types
    • PantherFlow Expressions
    • PantherFlow Functions
      • Aggregation Functions
      • Date/time Functions
      • String Functions
      • Array Functions
      • Math Functions
      • Control Flow Functions
      • Regular Expression Functions
      • Snowflake Functions
      • Data Type Functions
      • Other Functions
    • PantherFlow Example Queries
      • PantherFlow Examples: Threat Hunting Scenarios
      • PantherFlow Examples: SOC Operations
      • PantherFlow Examples: Panther Audit Logs
  • Enrichment
    • Custom Lookup Tables
      • Creating a GreyNoise Lookup Table
      • Lookup Table Examples
        • Using Lookup Tables: 1Password UUIDs
      • Lookup Table Specification Reference
    • Identity Provider Profiles
      • Okta Profiles
      • Google Workspace Profiles
    • Anomali ThreatStream
    • IPinfo
    • Tor Exit Nodes
    • TrailDiscover (Beta)
  • Panther AI (Beta)
  • System Configuration
    • Role-Based Access Control
    • Identity & Access Integrations
      • Azure Active Directory SSO
      • Duo SSO
      • G Suite SSO
      • Okta SSO
        • Okta SCIM
      • OneLogin SSO
      • Generic SSO
    • Panther Audit Logs
      • Querying and Writing Detections for Panther Audit Logs
      • Panther Audit Log Actions
    • Notifications and Errors (Beta)
      • System Errors
    • Panther Deployment Types
      • SaaS
      • Cloud Connected
        • Configuring Snowflake for Cloud Connected
        • Configuring AWS for Cloud Connected
        • Pre-Deployment Tools
      • Legacy Configurations
        • Snowflake Connected (Legacy)
        • Customer-configured Snowflake Integration (Legacy)
        • Self-Hosted Deployments (Legacy)
          • Runtime Environment
  • Panther Developer Workflows
    • Panther Developer Workflows Overview
    • Using panther-analysis
      • Public Fork
      • Private Clone
      • Panther Analysis Tool
        • Install, Configure, and Authenticate with the Panther Analysis Tool
        • Panther Analysis Tool Commands
        • Managing Lookup Tables and Enrichment Providers with the Panther Analysis Tool
      • CI/CD for Panther Content
        • Deployment Workflows Using Panther Analysis Tool
          • Managing Panther Content via CircleCI
          • Managing Panther Content via GitHub Actions
        • Migrating to a CI/CD Workflow
    • Panther API
      • REST API (Beta)
        • Alerts
        • Alert Comments
        • API Tokens
        • Data Models
        • Globals
        • Log Sources
        • Queries
        • Roles
        • Rules
        • Scheduled Rules
        • Simple Rules
        • Policies
        • Users
      • GraphQL API
        • Alerts & Errors
        • Cloud Account Management
        • Data Lake Queries
        • Log Source Management
        • Metrics
        • Schemas
        • Token Rotation
        • User & Role Management
      • API Playground
    • Terraform
      • Managing AWS S3 Log Sources with Terraform
      • Managing HTTP Log Sources with Terraform
    • pantherlog Tool
    • Converting Sigma Rules
  • Resources
    • Help
      • Operations
      • Security and Privacy
        • Security Without AWS External ID
      • Glossary
      • Legal
    • Panther System Architecture
Powered by GitBook
On this page
  • Overview
  • Determine how many custom schemas you need
  • How to define a custom schema
  • Automatically infer the schema in Panther
  • How to infer a schema
  • Create the schema yourself
  • How to create a custom schema manually
  • Writing schemas
  • Schema field suggestions
  • Managing custom schemas
  • Editing a custom schema
  • Archiving and unarchiving a custom schema
  • Testing a custom schema
  • Enabling field discovery
  • Uploading log schemas with the Panther Analysis Tool
  • Troubleshooting Custom Logs

Was this helpful?

  1. Data Sources & Transports

Custom Logs

Define, write, and manage custom schemas

PreviousZscaler ZPANextLog Schema Reference

Last updated 2 months ago

Was this helpful?

Overview

Panther allows you to define your own custom log schemas. You can ingest custom logs into Panther via a , and your custom schemas will then normalize and classify the data.

This page explains how to determine how many custom schemas you need, infer, write, and manage custom schemas, as well as how to upload schemas with . For information on how to use pantherlog to work with custom schemas, please see .

Custom schemas are identified by a Custom. prefix in their name and can be used wherever a natively supported log type is used:

  • Log ingestion

    • You can onboard custom logs through a (e.g., HTTP webhook, S3, SQS, Google Cloud Storage, Azure Blob Storage)

  • Detections

    • You can write for custom schemas.

  • Investigations

    • You can query the data in and in . Panther will create a new table for the custom schema once you onboard a source that uses it.

Determine how many custom schemas you need

There is no definitive rule for determining how many schemas you need to represent data coming from a custom source, as it depends on the intent of your various log events and the degree of field overlap between them.

In general, it's recommended to create the minimum number of schemas required for each log type's shape to be represented by its own schema (with room for some field variance between log types to be represented by the same schema). A rule of thumb is: if two different types of logs (e.g., application audit logs and security alerts) have less than 50% overlap in required fields, they should use different schemas.

In the table below, see example scenarios and their corresponding schema recommendations:

Scenario
Schema recommendation

You have one type of log with fields A, B, and C, and a different type of log with fields X, Y, and Z.

Create two different schemas, one for each log type.

While it's technically possible to create one schema with all fields (A, B, C, X, Y, Z) marked as optional (i.e., required: false), it's not recommended, as downstream operations like detection writing and searching will be made more difficult.

You have one type of log that always has fields A, B, and C, and a different type of log that always has fields A, B, and Z.

Create one schema, with fields A and B marked as required and fields C and Z marked as optional.

After you have determined how many schemas you need, you can define them.

How to define a custom schema

Panther supports JSON data formats and CSV with or without headers for custom log types. For inferring schemas, Panther does not support CSV without headers.

There are multiple ways to define a custom schema. You can:

  • Infer one or more schemas from data.

  • Create a schema manually.

Automatically infer the schema in Panther

Instead of writing a schema manually, you can let the Panther Console or the pantherlog CLI tool infer a schema (or multiple schemas) from your data.

When Panther infers a schema, note that if your data sample has:

  • A field of type object with more than 200 fields, that field will be classified as type json.

  • A field with mixed data types (i.e., it is an array with multiple data types, or the field itself has varying data types), that field will be classified as type json.

How to infer a schema

There are multiple ways to infer a schema in Panther:

  • In the Panther Console:

  • In the CLI workflow:

Inferring a custom schema from sample logs

To get started, follow these steps:

  1. Log in to your Panther Console.

  2. On the left sidebar, navigate to Configure > Schemas.

  3. At the top right of the page next to the search bar, click Create New.

  4. Enter a Schema ID, Description, and Reference URL.

    • The Description is meant for content about the table, while the Reference URL can be used to link to internal resources.

  5. In the Schema section, in the Infer a schema from sample events tile, click Start.

  6. In the Infer schema from sample logs modal, click one of the radio buttons:

    • Upload Sample file: Upload a sample set of logs: Drag a file from your system over the pop-up modal, or click Select file and choose the log file.

      • Note that Panther does not support CSV without headers for inferring schemas.

  7. After uploading a file, Panther will display the raw logs in the UI. You can expand the log lines to view the entire raw log. Note that if you add another sample set, it will override the previously-uploaded sample.

    • Lines: Events are separated by a new line character.

    • JSON: Events are in JSON format.

    • JSON Array: Events are inside an array of JSON objects.

    • CloudWatch Logs: Events came from CloudWatch Logs.

    • Auto: Panther will automatically detect the appropriate stream type.

  8. Click Infer Schema.

    • Panther will begin to infer a schema from the raw sample logs.

    • Panther will attempt to infer multiple timestamp formats.

  9. To ensure the schema works properly against the sample logs you uploaded and against any changes you made to the schema, click Run Test.

    • This test will validate that the syntax of your schema is correct and that the log samples you have uploaded into Panther are successfully matching against the schema.

      • All successfully matched logs will appear under Matched; each log will display the column, field, and JSON view.

      • All unsuccessfully matched logs will appear under Unmatched; each log will display the error message and the raw log.

  10. Click Save to publish the schema.

Panther will infer from all logs uploaded, but will only display up to 100 logs to ensure fast response time when generating a schema.

Inferring a custom schema from S3 data received in Panther

View raw S3 data

After onboarding your S3 bucket into Panther, you can view raw data coming into Panther and infer a schema from it:

  1. Choose from the following options:

    • I want to add an existing schema: Choose this option if you already created a schema and you know the S3 prefix you want Panther to read logs from. Click Start in the tile.

    • I want to generate a schema from raw events: Select this option to generate a schema from live data in this bucket and define which prefixes you want Panther to read logs from. Click Start in the tile.

      • Note that you may need to wait up to 15 minutes for data to start streaming into Panther.

        • This data is displayed from data-archiver, a Panther-managed S3 bucket that retains raw logs for up to 15 days for every S3 log source.

        • Only raw log events that were placed in the S3 bucket after you configured the source in Panther will be visible, even if you've set the timespan to look further back.

        • If your raw events are JSON-formatted, you can view them as JSON by clicking View JSON in the left-hand column.

Infer a schema from raw data

If you chose to I want to generate a schema from raw events in the previous section, now you can infer a schema.

  1. Once you see data populating in Raw Events, you can filter the events you'd like to infer a schema from by using the string Search, S3 Prefix, Excluded Prefix, and/or Time Period filters at the top of the Raw Events section.

  2. On the Infer New Schema modal that pops up, enter the following:

    • New Schema Name: The name of the schema that will map to the table in the data lake once the schema is published.

      • The name will always start with Custom. and must have a capital letter after.

    • S3 Prefix: Use an existing prefix that was set up prior to inferring the schema or a new prefix.

      • The prefix you choose will filter data from the corresponding prefix in the S3 bucket to the schema you've inferred.

  3. Click Infer Schema.

    • At the top of the page, you will see '<schema name>' was successfully inferred.

    • The schema will then be placed in a Draft mode until you're ready to publish to production after testing.

Test the schema with raw data

Once your schemas and prefixes are defined, you can proceed to testing the schema configuration against raw data.

    • Once the test is started, the results appear with the amount of matched and unmatched events.

      • Matched Events represent the number of events that would successfully classify against the schema configuration.

      • Unmatched Events represent the number of events that would not classify against the schema.

    • Click Back to Schemas, make changes as needed, and test the schema again.

  1. Click Back to Schemas.

    • The inferred schema is now attached to your log source.

Inferring custom schemas from historical S3 data

You can infer and save one or multiple schemas for a custom S3 log source from historical data in your S3 bucket (i.e., data that was added to the bucket before it was onboarded as a log source in Panther).

Prerequisite: Onboard your S3 bucket to Panther

Step 1: View the S3 bucket structure in Panther

After creating your S3 bucket source in Panther, you can view your S3 bucket's structure and data in the Panther Console:

  1. In the Panther Console, navigate to Configure > Log Sources. Click into your S3 log source.

  2. In the log source's Overview tab, scroll down to the Attach a Schema to start classifying the data section.

  3. On the right side of the I want to generate a schema from bucket data tile, click Start.

    • You will be redirected to a folder inspection of your S3 bucket. Here, you can view and navigate through all folders and objects in the S3 bucket.

Step 2: Navigate through your data

  • While viewing the folder inspection, click an object.

    • A preview window will appear, displaying a preview of its events:

If the events fail to render correctly (either generating an error or displaying events improperly), it's possible the wrong stream type has been chosen for the S3 bucket source. If this is the case, click Selected Logs Format is n:

Step 3: Indicate if each folder has existing schema or a new one should be inferred

After reviewing what's included in your bucket, you can determine if one or multiple schemas is necessary to represent all of the bucket's data. Next, you can select folders that include data with distinct structures and either infer a new schema, or assign an existing one.

  1. Determine whether one or more schemas will need to be inferred from the data in your S3 bucket.

    • If all data in the S3 bucket is of the same structure (and therefore can be represented by one schema), you can leave the default Infer New Schema option selected on the bucket level. This generates a single schema for all data in the bucket.

    • If the S3 bucket includes data that need to be classified in multiple schemas, follow the steps below for each folder in the bucket:

      1. Select a folder and click Include.

        • By default, each newly included folder has the Infer New Schema option selected.

Step 4: Wait for schemas to be inferred

The schema inference process may take up to 15 minutes. You can leave this page while the process completes. You can also stop this process early, and keep the schema(s) inferred during the time that the process ran.

Step 5: Review the results

After the inference process is complete, you can view the resulting schemas and the number of events that were used during each schema's inference. You can also validate how each schema parses raw events.

  1. Click the play icon on the right side of each row.

  2. Click the Events tab to see the raw and normalized events.

  3. Click the Schema tab to see the generated schema.

Step 6: Name the schema(s) and save source

Before saving the source, name each of the newly inferred schemas with a unique name by clicking Add name.

After all new schemas have been named, you will be able to click Save Source in the upper right corner.

Inferring a custom schema from HTTP data received in Panther

View raw HTTP data

    • Do not select a schema during HTTP source setup.

  1. Choose from the following options:

    • I want to add an existing schema: Choose this option if you already created a schema. Click Start in the tile.

      • You will be navigated to the HTTP source edit page, where you can make a selection in the Schemas - Optional field:

    • I want to generate a schema: Select this option to generate a schema from live data. Click Start in the tile.

      • Note that you may need to wait a few minutes after POSTing the events to the HTTP endpoint for them to be visible in Panther.

      • On the page you are directed to, under Raw Events, you can view the raw data Panther has received within the last week:

      • This data is displayed from data-archiver, a Panther-managed S3 bucket that retains raw HTTP source logs for 15 days.

Infer a schema from raw data

If you choose I want to generate a schema in the previous section, now you can infer a schema.

  1. On the Infer New Schema modal that pops up, enter the:

    • New Schema Name: Enter a descriptive name. It will always start with Custom. and must have a capital letter after.

  2. Click Infer Schema.

    • At the top of the page, you will see '<schema name>' was successfully inferred.

    • The schema will be placed in Draft mode until you're ready to publish it, after testing.

Test the schema with raw data

Once your schema is defined, you can proceed to test the schema configuration against raw data.

    • Once the test is started, the results appear with the amount of matched and unmatched events.

      • Matched Events represent the number of events that would successfully classify against the schema configuration.

      • Unmatched Events represent the number of events that would not classify against the schema.

    • Click Back to Schemas, make changes as needed, and test the schema again.

  1. Click Back to Schemas.

    • The inferred schema is now attached to your log source.

    • Log events that were sent to the HTTP source before it had a schema attached, which were used to infer the schema, are then ingested into Panther.

Create the schema yourself

How to create a custom schema manually

To create a custom schema manually:

  1. In the Panther Console, navigate to Configure > Schemas.

  2. Click Create New in the upper right corner.

  3. Enter a Schema ID, Description, and Reference URL.

    • The Description is meant for content about the table, while the Reference URL can be used to link to internal resources.

  4. In the Schema section, in the Create your schema from scratch tile, click Start.

  5. In the Parser section, if your schema requires a parser other than the default (JSON) parser, select it. Learn more about the other parser options on the following pages:

  6. In the Fields & Indicators section, write or paste your YAML log schema fields.

  7. (Optional) In the Universal Data Model section, define Core Field mappings for your schema.

  8. At the bottom of the window, click Run Test to verify your schema contains no errors.

    • Note that syntax validation only checks the syntax of the Log Schema. It can still fail to save due to name conflicts.

  9. Click Save.

You can now navigate to Configure > Log Sources and add a new source or modify an existing one to use the new Custom.SampleAPI _Log Type. Once Panther receives events from this source, it will process the logs and store them in the custom_sampleapi table.

Writing schemas

See the tabs below for instructions on writing schemas for JSON logs and for text logs.

Writing a schema for JSON logs

To parse log files where each line is JSON you have to define a log schema that describes the structure of each log entry.

In the example schemas below, the first tab displays the JSON log structure and the second tab shows the Log Schema.

Note: Please leverage the Minified JSON Log Example when using the pantherlog tool or generating a schema within the Panther Console.

{
  "method": "GET",
  "path": "/-/metrics",
  "format": "html",
  "controller": "MetricsController",
  "action": "index",
  "status": 200,
  "params": [],
  "remote_ip": "1.1.1.1",
  "user_id": null,
  "username": null,
  "ua": null,
  "queue_duration_s": null,
  "correlation_id": "c01ce2c1-d9e3-4e69-bfa3-b27e50af0268",
  "cpu_s": 0.05,
  "db_duration_s": 0,
  "view_duration_s": 0.00039,
  "duration_s": 0.0459,
  "tag": "test",
  "time": "2019-11-14T13:12:46.156Z"
}

Minified JSON log example:

{"method":"GET","path":"/-/metrics","format":"html","controller":"MetricsController","action":"index","status":200,"params":[],"remote_ip":"1.1.1.1","user_id":null,"username":null,"ua":null,"queue_duration_s":null,"correlation_id":"c01ce2c1-d9e3-4e69-bfa3-b27e50af0268","cpu_s":0.05,"db_duration_s":0,"view_duration_s":0.00039,"duration_s":0.0459,"tag":"test","time":"2019-11-14T13:12:46.156Z"}

fields:
- name: time
  description: Event timestamp
  required: true
  type: timestamp
  timeFormats: 
   - rfc3339
  isEventTime: true
- name: method
  description: The HTTP method used for the request
  type: string
- name: path
  description: The path used for the request
  type: string
- name: remote_ip
  description: The remote IP address the request was made from
  type: string
  indicators: [ ip ] # the value will be appended to `p_any_ip_addresses` if it's a valid ip address
- name: duration_s
  description: The number of seconds the request took to complete
  type: float
- name: format
  description: Response format
  type: string
- name: user_id
  description: The id of the user that made the request
  type: string
- name: params
  type: array
  element:
    type: object
    fields:
    - name: key
      description: The name of a Query parameter
      type: string
    - name: value
      description: The value of a Query parameter
      type: string
- name: tag
  description: Tag for the request
  type: string
- name: ua
  description: UserAgent header
  type: strinll

Writing a schema for text logs

Panther handles logs that are not structured as JSON by using a 'parser' that translates each log line into key/value pairs and feeds it as JSON to the rest of the pipeline. You can define a text parser using the parser field of the Log Schema. Panther provides the following parsers for non-JSON formatted logs:

Name

Description

Match each line of text against one or more simple patterns

Use regular expression patterns to handle more complex matching such as conditional fields, case-insensitive matching etc

Treat log files as CSV mapping column names to field names

Parse text logs, or perform transformations on json logs

Schema field suggestions

When creating or editing a custom schema, you can use field suggestions generated by Panther. To use this functionality:

  1. In the Panther Console, click into the YAML schema editor.

    • To edit an existing schema, click Configure > Schemas > [name of schema you would like to edit] > Edit.

    • To create a new schema, click Configure > Schemas > Create New.

  2. Press Command+I on macOS (or Control+I on PC).

    • The schema editor will display available properties and operations based on the position of the text cursor.

Managing custom schemas

Editing a custom schema

Panther allows custom schemas to be edited. Specifically, you can perform the following actions:

  • Add new fields.

  • Rename or delete existing fields.

  • Edit, add, or remove all properties of existing fields.

  • Modify the parser configuration to fix bugs or add new patterns.

Note: After editing a field's type, any newly ingested data will match the new type while any previously ingested data will retain its type.

To edit a custom schema:

  1. Navigate to your custom schema's details page in the Panther Console.

  2. Click Edit in the upper-right corner of the details page.

  3. Modify the schema.

    • To more easily see your changes (or copy or revert deleted lines), click Single Editor, then Diff View.

  4. In the upper-right corner, click Update.

Click Run Test to check the YAML for structural compliance. Note that the rules will only be checked after you click Update. The update will be rejected if the rules are not followed.

Update related detections and saved queries

Editing schema fields might require updates to related detections and saved queries. Click Related Detections in the alert banner displayed above the schema editor to view, update, and test the list of affected detections and saved queries.

Query implications

Queries will work across changes to a Type provided the query does not use a function or operator which requires a field type that is not castable across Types.

  • Good example: The Type is edited from string to int where all existing values are numeric (i.e. "1"). A query using the function sum aggregates old and new values together.

  • Bad example: The Type is edited from string to int where some of the existing values are non-numeric (i.e. "apples"). A query using the function sum excludes values that are non-numeric.

Query castability table

This table shows which Types can be cast as each Type when running a query. Schema editing allows any Type to be changed to another Type.

Type From -> To
boolean
string
int
bigint
float
timestamp

boolean

same

yes

yes

yes

no

no

string

yes

same

numbers only

numbers only

numbers only

numbers only

int

yes

yes

same

yes

yes

numbers only

bigint

yes

yes

yes

same

yes

numbers only

float

yes

yes

yes

yes

same

numbers only

timestamp

no

yes

no

no

no

same

Archiving and unarchiving a custom schema

You can archive and unarchive custom schemas in Panther. You might choose to archive a schema if it's no longer used to ingest data, and you do not want it to appear as an option in various dropdown selectors throughout Panther. In order to archive a schema, it must not be in use by any log sources. Schemas that have been archived still exist indefinitely; it is not possible to permanently delete a schema.

Attempting to create a new schema with the same name as an archived schema will result in a name conflict, and prompt you to instead unarchive and edit the existing schema.

To archive or unarchive a custom schema:

  1. In the Panther Console, navigate to Configure > Schemas.

    • Locate the schema you'd like to archive or unarchive.

  2. On the right-hand side of the schema's row, click the Archive or Unarchive icon.

  3. On the confirmation modal, click Continue.

Testing a custom schema

Additionally, the above log formats can be compressed using the following formats:

  • gzip

  • zstd (without dictionary)

Multi-line logs are supported for JSON and JSONArray formats.

To validate that a custom schema will work against your logs, you can test it against sample logs:

  1. In the left-hand navigation bar in your Panther Console, click Configure > Schemas.

  2. Click on a custom schema's name.

  3. In the upper-right corner of the schema details page, click Test Schema.

Enabling field discovery

Log source schemas in Panther define the log event fields that will be stored in Panther. When field discovery is enabled, data from fields in incoming log events that are not defined in the corresponding schema will not be dropped—instead, the fields will be identified, and the data will be stored. This means you can subsequently query data from these fields, and write detections referencing them.

Handling of special characters in field names

If the name of a discovered field contains a special character—i.e., a character that is not alphanumeric, an underscore (_), or a dash (-)— it will be transliterated using the algorithm below:

  • @ to at_sign

  • , to comma

  • ` to backtick

  • 'to apostrophe

  • $ to dollar_sign

  • * to asterisk

  • & to ambersand

  • ! to exclamation

  • % to percent

  • + to plus

  • / to slash

  • \ to backslash

  • # to hash

  • ~ to tilde

  • = to eq

All other ASCII characters (including space) will be replaced with an underscore (_). Non-ASCII characters are transliterated to their closest ASCII equivalent.

This transliteration affects only field names; values are not modified.

Limitations

Field discovery currently has the following limitations:

  • The maximum number of top-level fields that can be discovered is 2,000. Within each object field, a maximum of 1,000 fields can be discovered.

    • There is no limitation on the number of overall fields discovered.

Uploading log schemas with the Panther Analysis Tool

The uploader command receives a base path as an argument and then proceeds to recursively discover all files with extensions .yml and .yaml.

It is recommended to keep schema files separately from other unrelated files, otherwise you may notice several unrelated errors for attempting to upload invalid schema files.

panther_analysis_tool update-custom-schemas --path ./schemas

The uploader will check if an existing schema exists and proceed with the update or create a new one if no matching schema name is found.

The uploaded files are validated with the same criteria as Web UI updates.

Troubleshooting Custom Logs

If you have deduced that you need more than one schema and you'd like to use Panther's to generate them, it's recommended to do one of the following:

Use the method multiple times with samples from different log types

Send differently structured data to separate folders in a S3 bucket, then use the inference method

If you use either the or methods, you risk Panther generating a single schema that represents all log types sent to the source.

See .

See .

To infer a schema from sample data you've uploaded, see the tab, below.

To infer a schema from S3 data received in Panther, see the tab, below.

To infer one or more schemas from historical S3 data, see the tab, below.

To infer a schema from HTTP data received in Panther, see the tab, below.

Use the command.

You can generate a schema by uploading sample logs into the Panther Console. If you'd like to use the command line instead, follow the .

Optionally enable Field Discovery by clicking its toggle ON. Learn more in .

Paste sample events(s): Directly paste or type sample events in the editor.

Select the appropriate Stream Type ().

Once the schema is generated, it will appear in the schema editor box.

To see the test results, click View Events.

You can generate and publish a schema for a custom log source from live data streaming from an S3 bucket into Panther. You will first in Panther, then , then .

Follow the instructions to without having a schema in place.

While viewing your log source's Overview tab, scroll down to the Attach a schema to start classifying data section.

You will see a S3 Prefixes & Schemas popup modal:

On the page you are directed to, you can view the raw data Panther has received at the bottom of the screen:

Click Infer Schema to generate a schema.

If you don't need to specify a specific prefix, you can leave this field empty to use the catch-all prefix that is called *.

Click Done.

Review the schema and its fields by clicking its name.

Since the schema is in Draft, you can change, remove, or add fields as needed.

In the Test Schemas section at the top of the screen, click Run Test.

On the Test Schemas modal that pops up, select the Time Period you would like to test your schema against, then click Start Test.

Depending on the time range and amount of data, the test may take a few minutes to complete.

If there are Unmatched Events, inspect the errors and the JSON to decipher what caused the failures.

In the upper right corner, click Save.

Follow the instructions to without having a schema in place.

If you have onboarded the S3 source , that role must have the ListBucket permission.

Alternatively, you can access the folder inspection of your S3 bucket via the success page after in Panther. From that page, click Attach or Infer Schemas.

Alternatively, if there is a folder or subfolder that you do not want Panther to process, select it and click Exclude.

If you have an existing schema that matches the data, click the Schema dropdown on the right side of the row, then select the schema:

Click Infer n Schemas.

You can generate and publish a schema for a custom log source from live data streaming from an HTTP (webhook) source into Panther. You will first in Panther, then , then .

After creating your in Panther, you can view raw data coming into Panther and infer a schema from it:

Follow the in Panther.

While viewing your log source's Overview tab, scroll down to the Attach a schema to start classifying data section.

Once you see data populating within Raw Events, click Infer Schema.

Click Done.

Click the draft schema's name to review its inferred fields.

Since the schema is in Draft, you can add, remove, and otherwise change fields as needed.

In the Test Schemas section at the top of the screen, click Run Test.

In the Test Schemas pop-up modal, select the Time Period you would like to test your schema against, then click Start Test.

Depending on the time range and amount of data, the test may take a few minutes to complete.

If there are Unmatched Events, inspect the errors and the JSON to decipher what caused the failures.

In the upper right corner, click Save.

Optionally enable Automatic Field Discovery by clicking its toggle ON. Learn more in .

The Schema section will default to using Separate Sections. If you'd like to write your entire schema in one editor window, click Single Editor.

You can use Panther-generated .

Learn more in .

You can also now write to match against these logs and query them using or .

Note that you can use the to generate your Log Schema.

You can edit the YAML specifications directly in the Panther Console or they can be . For more information on the structure and fields in a Log Schema, see the .

It's also possible to use the with JSON logs to perform transformations outside of .

.

.

You can use Panther-generated .

Archiving a schema does not affect any data ingested using that schema already stored in the data lake—it is still queryable using and . By default, archived schemas are not shown in the schema list view (visible on Configure > Schemas), but can be shown by modifying Status, within Filters, in the upper right corner. In , tables of archived schemas are not shown under Tables.

If you are archiving a schema and it is currently associated to one or more log sources, the confirmation modal will prompt you to first detach the schema. Once you have done so, click Refresh.

The "Test Schema against sample logs" feature found on the Schema Edit page in the Panther Console supports Lines, CSV (with or without headers), JSON, JSON Array, CloudWatch Logs, and Auto. See for examples.

Field discovery is currently only available for custom schemas, not Panther-managed ones. See .

If your schema uses the and you are parsing CSV logs without a header, only fields included in the columns section of your schema will be discovered.

This does not apply if your schema uses the and you are parsing CSV logs with a header.

If your schema uses the , only fields defined inside the match patterns will be discovered.

If your schema uses the , only fields defined inside the match patterns will be discovered.

If you choose to maintain your log schemas outside of Panther, for example in order to keep them under version control and review changes before updating, you can upload the YAML files programmatically with the .

The schemafield must always be defined in the YAML file and be consistent with the existing schema name for an update to succeed. For a list of all available CI/CD fields see our .

Visit the Panther Knowledge Base to that answer frequently asked questions and help you resolve common errors and issues.

Data Transport
Panther Analysis Tool (PAT)
pantherlog CLI tool
Data Transport
rules and scheduled rules
Search
Data Explorer
onboard an S3 bucket onto Panther
onboard an S3 bucket onto Panther
HTTP source
Script Log Parser
Fastmatch Log Parser
Regex Log Parser
CSV Log Parser
detections
Search
Data Explorer
starlark parser
those that are natively supported by Panther
Data Explorer
Search
Data Explorer
csv parser
csv parser
fastmatch parser
regex parser
Panther Analysis Tool
Log Schema Reference
view articles about custom log sources
schema inference tools
Inferring a custom schema from sample logs
Inferring custom schemas from historical S3 data
Inferring a custom schema from S3 data received in Panther
Inferring a custom schema from HTTP data received in Panther
Automatically infer the schema in Panther
Create the schema yourself
Inferring a custom schema from sample logs
Inferring a custom schema from S3 data received in Panther
Inferring custom schemas from historical S3 data
Inferring a custom schema from HTTP data received in Panther
Enabling field discovery
view your S3 data
infer a schema
test the schema
view your HTTP data
infer a schema
test the schema
Enabling field discovery
schema field suggestions
Archive or unarchive the schema
Enable or disable field discovery
schema field suggestions
additional limitations of field discovery below
fastmatch
regex
csv
starlark
onboarding your S3 source
Log Schema Reference
instructions to set up an HTTP log source
with a custom IAM role
pantherlog infer
instructions on using the pantherlog CLI tool here
pantherlog CLI tool
Mapping Core Fields in Custom Log Schemas
view examples for each type here
prepared offline in your editor/IDE of choice
Stream Types
HTTP source edit page
HTTP Raw events
The folder inspection view in the Panther Console
In Panther, an S3 object is highlighted. A pop-over window is displaying a preview of its events.
On the source's folder selection view in the Panther Console, the option to select a stream type appears at the top.
The "Infer 1 schema" button is in the upper right corner of the S3 folders page in the Panther Console.
The source page in Panther shows the schema inference details, including an infer skipped and the number of events processed.
The edit page for an HTTP source is shown. In the Basic Information section, the "Schemas - Optional" dropdown field is open, but no selections have been made.
An HTTP source schema attachment page is shown. There is an arrow pointing to the section at the bottom, called "Raw Events." Various JSON events are included in this section. There is a blue "Infer Schema" button.
The Schema editor is shown, and the "Single Editor" and "Diff View" buttons are shown. One field has been changed, from event_time to new_name.
A schema's name is shown, "Custom.A"—to its right are three buttons: Upload Sample Logs, Cancel, and Update.
Two schema rows are shown, one that is currently archived and one that is currently unarchived. The archive/unarchive icons in each of their rows is circled.
A schema's name is shown. To its right are two buttons: Test Schema and Clone.
A Field Discovery toggle is set to ON.